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Abstract 



A framework for the mathematical modchng of evolution in group structured pop- 
ulations is introduced. The population is divided into a fixed large number of groups 
of fixed size. Prom generation to generation, new groups are formed that descend from 
previous groups, through a two-level Fisher- Wright process, with selection between 
groups and within groups and with migration between groups at rate m. When m=l, 
the framework reduces to the often used trait-group framework, so that our setting 
can be seen as an extension of that approach. Therefore our framework is sufficiently 
flexible to allow the analysis of many previously introduced models in which altruists 
and non-altruists compete, and provides new insights into these models. Wc focus 
on the situation in which initially there is a single altruistic allele, in the population, 
and no further mutations occur. The main questions are conditions for the viability 
of that altruistic allele to spread, and the fashion in which it spreads when it does. 
Because our results and methods are mathematically rigorous, we see them as shed- 
ding light on various controversial issues in this field, including the role of Hamilton's 
rule, and of the Price equation, the relevance of linearity in fitness functions, the need 
to only consider pairwise interactions, or weak selection, etc. In the current paper 
we analyze the early stages of the evolution, during which the number of altruists is 
small compared to the size of the population. We show that during this stage the 
evolution is well described by a multitype branching process. The driving matrix 
for this process can be readily obtained, reducing the problem of determining when 
the altruistic gene is viable to a comparison between the leading eigenvalue (Perron- 
Frobenius eigenvalue) of that matrix, and the fitness of the non-altruists before the 
altruistic gene appeared. This leads to a generalization of Hamilton's condition for 
the viability of a mutant gene. That generalized viability condition can be interpreted 
in an appropriate neighbor modulated fitness sense, providing a gene's eye view of 
the generalized rule. Our generalized Hamilton rule reduces to the traditional one for 
public goods games, and more generally under the condition of linearity of the fitness 
of each carrier of the gene A as a function of the number of copies of that gene in the 
same group. Our analysis also suggests a broadly applicable criterion, that wc make 
explicit, for the viability of a mutant gene, in a more general setting. Our generalized 
Hamilton condition simplifies considerably when selection is weak, and further when 
groups are large. We analyze a significant number of examples, and observe that 
the altruistic gene can spread under relatively low levels of rclatedness in the groups, 
corresponding to relatively high levels of migration. This happens, for instance, when 
the fitness of individuals is affected by repeated activities in their groups, and the 
altruistic mutant gene promotes cooperation in each round in a fashion that is condi- 
tional on the behavior of the group members in previous rounds. This class of models 
is a natural extension to the group structured population setting of tit-for-tat and 
related conditional strategies in the iterated two player setting. We propose that this 
kind of conditional altruistic behavior in groups be investigated as a possible route 
for the spread of altruistic behavior through natural selection. 
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1 Core results 



We introduce a stylized framework for studying the evolution of a group structured popu- 
lation. Our goal is to shed light and clarify issues in the ongoing debate on the interplay 
between group selection and kin selection. We will focus here on the application of the 
framework to the question of the spread of an altruistic gene A, resulting from a mutation, 
in the absence of further mutations. This is the central issue addressed in the debate, 
and is well suited for introducing our framework. Comments on other natural applications 
of the framework will be made at various places in this paper. For background material, 
and a significant sample of work addressing altruism, cooperation, group selection and kin 
selection, from different perspectives, we refer the reader to the papers/books listed in the 
reference section (except for [11] and [28]) and references therein. 

We conceived our approach in the spirit of basic stylized frameworks in population ge- 
netics, like the Fisher- Wright framework with selection. By this we mean that we aimed 
at keeping the elements in the modeling mathematically precise, and as simple as possible, 
provided they would still capture the basic biological features that one wants to study. 
Central to the contribution in the present work is the fact that rigorous mathematical 
methods can be used to decide the fate of the mutant gene A. This allows us to compare 
our rigorous conditions for the spread of altruism with basic concepts and issues including 
Hamilton's rule, the Price equation, neighbor modulated fitness computations, the com- 
patibility between a gene's eye view and a group selection mechanism, whether pairwise 
interactions, linearity of fitness functions, or weak selection have to be assumed, the pos- 
sibility of altruism to spread in group structured populations when the migration rate is 
significantly higher than the inverse of the group size, etc. We believe that our results help 
in clarifying these issues, and others that are being debated, and we hope that it will bring 
some consilience to this field, allowing for a greater level of collaboration among the various 
groups contributing to the area. 

Our framework can be seen as a mathematically precise version of what in [10], p. 6737, 
is called a "typical kin selection model". One of our main goals was to develop methods 
that apply to much more general fitness functions than those considered there, and that 
do not require the assumption that selection is weak. 

Our framework is also a natural extension of the classical trait-group framework (for the 
origins of this framework see [82]) and therefore allows for the analysis of the models that 
have previously been studied in that framework. What distinguishes our framework from 
the trait-group one, is a migration rate parameter < m < 1, with the case m = 1 reducing 
to the trait-group framework. When m < 1, our framework introduces assortment, in the 
sense that offsprings of members of the same group tend to stay together. The migration 
parameter m determines the strength of this assortment; the smaller it is the stronger the 
assortment. 

Under natural conditions on the altruistic gene A, to be specified later, we show that 
there are two critical values of the migration rate m, namely < rrif < rus < 1, playing roles 
as follows. For < m < 1 the gene A is eliminated. For < m < rrif the gene A has a 
positive probability of fixating, replacing the wild allele N. In the intermediate regime, when 
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nif < m < TJis, the outcome is model dependent, but typically there is a positive probability 
for the mutant gene A to spread and reach a polymorphic equilibrium with the wild allele 
N. In this paper we will focus on the critical point rris (the subscript s stands for survival 
of the mutant allele A), and the corresponding mathematically rigorous conditions for the 
spread of altruism in our framework. Results on mj and the corresponding conditions 
for fixation of A, as well as results on the evolution of the frequency of the gene A in 
the population when it spreads (either in the intermediate polymorphic regime, or in the 
fixation regime) will be presented elsewhere ([* '"])• 

For the reader's benefit, and for brevity, we will not present here the full rigorous 
mathematical proofs, but rather explain why the various results are true at a level that, 
we hope, will make them generally quite intuitive. We emphasize that the results are 
mathematically rigorous and hold for any strength of selection. In the special case of weak 
selection simplifications occur and will also be discussed. 

We consider a population in which individuals live in a large number g of groups of 
size n. Individuals are of two genetically determined phenotypic types, the wild N and the 
altruistic mutant A. Reproduction is asexual and the type is inherited without mutation by 
the offsprings. Each individual has a relative fitness that depends on its type and the types 
of the other members of its group (the idea being that altruists, at a cost to themselves, 
provide a benefit to the members of their group). The relative fitness of an altruist, and 
that of a non-altruist, both in a group that has a total number k of altruists, will be written, 
respectively, as 

wt = l + 5vt = 1 + 5v^. 

with the convention that Vq = 0, i.e., Wq = 1. The quantities and represent payoffs 
to altruistic, or non-altruistic behavior. The parameter 6 > indicates the strength of 
selection, with the limit 5—7-0 corresponding to the limit of weak selection, and the case 
S = corresponding to the case in which there is no selection, only neutral genetic drift. 
Examples of payoff functions will be provided in Section 2. See also Fig. 2. 

Evolution operates as the next generation is formed through a process that involves 
group competition and competition within groups, followed by migration at rate < m < 1, 
as summarized in Fig.l. Competition among groups is idealized as an (intergroup- level) 
Fisher- Wright process with selection, described as follows. We associate to each group a 
relative fitness given by the average relative fitness of its members. This means that a 
group with k altruists has relative group fitness 

kw^ + (n- k)w^ ^ ^ , kv^ + (n - k)vu , 

Wk = — ^ = 1 + 5vk, where Vk = — ^ —. 1 

n n 

Each group in the new generation has a parental group from the previous generation, chosen 
independently with probability proportional to group relative fitness. Competition among 
members of a group is described by an (intragroup-level) Fisher- Wright process with selec- 
tion, described as follows. The n members of each group in the new generation each has 
a parent from among the n individuals in their parental group, chosen independently with 
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probability proportional to the fitnesses of the members of that parental group. (A stan- 
dard probability computation, using conditioning on the parental group, shows that each 
individual in the old generation has then an expected number of offspring proportional to 
its relative fitness. Conversely, for this to be true, the fitness associated to the groups in the 
intergroup level Fisher- Wright process must be given by (1). The conceptual relevance of 
this equivalence is emphasized in [36].) Once the new g groups have been formed according 
to this two-level Fisher- Wright process, a fraction m of the individuals migrates from their 
group to a randomly chosen group, preserving the constancy of the number n of members 
of the groups. More precisely, each individual, independently of anything else, leaves its 
group with probability m; the migrants then return to the g groups in a random fashion, 
filling vacancies, so that each group has again n members. Each possible way of assigning 
the migrants to the vacancies left in the groups is equally likely, meaning that the migra- 
tion process is completely random. (Mathematically: each individual is independently of 
anything else, with probability m, declared to be a migrant, and one applies then a random 
permutation to the set of migrants.) 

In the case m = 1, we can equivalently think that the new groups are formed by random 
assortment from a metapopulation with gn individuals. Each one of these gn individuals 
has a parent chosen independently with probability proportional to relative fitness from the 
gn individuals in the old generation. This is precisely the traditional trait-group framework. 

A model within our framework is specified by giving the values of n and the relative 
fitnesses and . The number of groups g will be considered to be very large, corre- 
sponding to taking the limit (7 — )• oo in the computations. The study of how finiteness of 
g modifies the conclusions is very interesting, but will be deferred to a later investigation. 

Our results on the viability of a single gene A to spread do not depend on any conditions 
on the parameters of the model. (These results are summarized in the paragraph that 
contains display (9).) But to keep the presentation more focused and interesting, we will 
assume that conditions (CI) and (C2) below hold, except when stated otherwise. 

(CI) Vi < 0, i.e., Wi < 1 = Wq , so that an isolated type A individual has lower fitness 
than the wild type N has in groups without altruists. 

(C2) > 0, i.e., > 1 = Wq , SO that type A individuals have greater fitness when in 
single-type groups than type N individuals have when in single-type groups. 

Condition (CI) is sometimes referred to, after [8.')], as the condition for A to be called 
"strongly altruistic". This condition means that an isolated gene A is at a disadvantage 
with respect to the wild type N in the population at large. In the trait-group framework, 
m = 1, this condition makes it impossible for A to invade. We will see that, as expected, 
this condition makes it impossible for this gene to invade also when m is close to 1, so that 
rus < 1. 

On the other hand, we will see that condition (C2) is sufficient for A to spread when m 
is close to 0, so that > 0. 
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We will study the evolution of the population, when started in generation from the 
situation in which only one individual is of type A. Naturally this refers to the situation in 
which a mutation from N to A has just occurred, and to the assumption that the mutation 
rate is so low that no further mutations will occur before the fate of that mutant gene has 
been decided. Obviously the mutant A may disappear in a few generations, but we want 
to determine here when it is viable, in the sense that it has a fair chance of spreading. We 
will denote by N'^it) the number of altruists in generation t. In case 6 = 0, meaning that 
the mutation is neutral, the expected number of altruists remains constant, iE(A^^(t)) = 1, 
and a standard martingale argument gives probability 1 / (ng) to the event that A will not 
disappear, but rather fixate eventually. As (7 — )■ oo this probability vanishes. When 6 > 0, 
and condition (CI) holds, prospects are even worse for the mutant gene A in generation 
1. The expected number of type A individuals then is IE{Nf) = Wi < 1. When m = 1, 
these bad prospects worsen with time, since in the first few generations the possible type 
A are likely to all be in different groups (since g is large), and so are always carrying the 
same fitness < 1. This leads to IE[N^) = (w^)*, and to the certain elimination of the 
gene A. In the opposite extreme, when m = 0, a group with n altruists may be created 
by chance in a few generations. Groups that descend from this one will always have only 
type A individuals, who therefore have average fitness > 1 = Wq , by (C2). In this 
situation it is reasonable to expect that the altruistic gene can spread with a probability 
that does not vanish as (7 —t- 00. The rigorous analysis of what happens in this case and in 
the more important case < m < 1 can be done using the theory of multitype branching 
processes, as covered, for instance in Chapter II of We turn next to the application 
of that theory to solving our problem. 

In applying multitype branching process theory here, we must emphasize that that 
theory describes well the evolution in our framework only in its early stage. By early stage, 
to be abbreviated E.S., we mean the generations before the number of groups that contain 
altruists is comparable to g. We will nevertheless see that this early stage period is of order 
logg generations, so that it covers a large number of generations since g is large. 

We say that a group is of type k if it has exactly k altruists. First we explain how 
multitype branching theory can be used when m = 0. In the E.S., there are few groups 
with altruists, compared with the total number g of groups. Therefore in the intergroup 
Fisher- Wright process the competition among groups with altruists is basically irrelevant; 
groups are mostly competing with the groups without altruists, that form the background 
on which groups with altruists may or not spread. To see this, note first that in each 
generation there are g new groups being formed, and that they choose their parental groups 
independently with probability proportional to group fitness. Since the vast majority of 
the groups have no altruists, and therefore group fitness Wq = Wq = 1, a type k group has a 
probability close to Wk/g of being the parent of each new group. Hence, each group of type 
k, with k > 1, can be seen, in first approximation, as creating independently of the other 
ones a number of offspring groups that is given by a binomial distribution with parameters 
g and Wk/g (well approximated by a Poisson distribution with mean Wk)- During the E.S., 
there are much less than g groups with altruists, and they are each producing a number of 
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offspring groups that is also small compared to g. For this reason each one of these groups 
interferes little with the other groups with altruists in their creation of offspring groups. 
This independence in the creation of offspring groups is what defines a multitype branching 
process. The next fact to observe is that each group that has as its parental group a group 
of type k will be of type k', due to the intragroup Fisher- Wright process, with probability 

p{k, k') = IP (Bin(n, kw^ /nwk) = k'), 

where Bin(n,p) is a binomial random variable with n attempts, each with probability p 
of success. Assembling the pieces above, we conclude that, through the two-level Fisher- 
Wright process, a group of type k creates in the average 

Mk,k' = WkP{k, k') (2) 

groups of type k' in the next generation, independently of anything else. When m = 
this is the whole story. The matrix M, of size n x n, defined by (2), with k = l,...,n, 
k' = 1, ...,n, characterizes the evolution of this process 

When m > 0, the creation of the new generation of groups is complicated by migration. 
One could be concerned that a multitype branching process description is no longer feasible. 
Fortunately this fear is unfounded, thanks to the fact that we are only considering the 
E.S., during which N^{t) « g. Altruists form then a minute fraction of the migrant 
population, and as a consequence it is unlikely that migrant altruists will settle in groups 
that contributed altruists to the migrant population, or that any group will receive more 
than one migrant altruist. A group that has k' altruists before migration, will keep after 
migration a random number of altruists given by a binomial distribution with k' attempts 
and probability 1 — m of success. This means that the probability that after migration this 
group is replaced by a group with k" altruists is given by 

Ak>^k" = IP {Bm{k', 1 - m) = k"). (3) 

That group that had k' altruists before migration, will also be contributing with an expected 
number k'm of migrant altruists, who are likely each to settle in a different group that had 
no altruists before migration, and has exactly one altruist after migration, i.e., is now of 
type 1. This means that the expected number of groups of type k" created from groups of 
type that received altruists from our group that had k' altruists before migration is given 
by 

B , n = I " (A) 

\ 0, otherwise. ^ ^ 

The matrix M should therefore be replaced, due to migration, with the matrix M{A + B) 
in describing the expected number of groups of type k" created in the new generation by 
each group of type k in the old generation. We will use the notation Nk{t) for the number 
of groups of type k in generation t, and also write N{t) = {Ni{t), A^n(t)). In summary, 
we have, in matrix notation, that for t in the E.S., 

IE{N{t + l)\N{t)) = N{t)M {A + B). (5) 
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Obviously N^{t) = Ni{t) + 2N2{t) + ...+nNn{t). Therefore, the survival of the altruistic 
gene is equivalent to the survival of the multitype branching process N{t). Next we describe 
the necessary and sufficient condition for the survival with positive probability of this 
multitype branching process. In what follows we will suppose that < m < 1; the cases 
m = and and m = 1 can be treated as limits. Because the matrix M{A + B) has then only 
strictly positive entries, it results from the Perron- Frobenius Theorem (see, e.g.. Theorem 
5.1 in Chapter II of [2s]) that it has an eigenvalue p that is simple, positive and larger in 
absolute value than all other eigenvalues. It corresponds to left and right eigenvectors, both 
of which have all their entries strictly positive. We will denote by u this left-eigenvector, 
normalized so as to represent a probability distribution over group types: i/i + ... + z/„ = 1. 
(Illustrations of p and u as functions of m and 6 appear in Fig. 3, Fig. 5 and Fig. 8, for 
various models.) A consequence of (5) and of the Perron- Frobenius Theorem is that for t 
large, but still in the E.S., 

IEN{t) = Cp'u, lEN'^it) = Cp'^kvk. (6) 

k 

where C > is a constant. 

Theorem 7.1 in Chapter II of [_. .] states that the survival with positive probability of 
the multitype branching process is equivalent to the condition 

p(m) > 1. (7) 

(We will make m, 5, etc, explicit in the notation p, etc, only when important.) The 
critical value is then obtained by solving the equation 

p(m,) = 1. (8) 

(For some numerical examples, see Fig. 3, Fig. 1, Fig. 6 and Fig. 7.) To see that this 
equation has a solution in the open interval (0,1), we use continuity of eigenvalues and 
eigenvectors. In particular p(m) is continuous, and it is enough to observe how it behaves 
as m approaches or 1. When m = 0, M[A + B) = M has the eigenvalue > 1 (by 
(C2)), corresponding to the left-eigenvector (0,0, ...,0, 1). This implies that p(m) must be 
larger than 1 when m is close to 0. When m = 1, the matrix A = 0, and M{A + B) = MB 
has only the first column not identically 0. Therefore any of its left-eigenvectors must be of 
the form (a, 0, 0). When such a vector is multiplied by MB, the result is Wi{a, 0, 0). 
This shows that Wi is the only eigenvalue of M{A + B). Therefore p(m) must converge to 
<l (by (CI)), as m 1. 

The argument above shows that (S) has a solution. Uniqueness of this solution is not 
guaranteed, unless additional conditions are assumed on the fitnesses. In any case, if there 
is more than one solution to (8), the natural definition of m^, that we adopt, is as the 
largest one, representing the least strength of assortment of altruists that suffices to allow 
altruism to survive. 

The theory of multitype branching processes provides us with further detailed infor- 
mation on the patterns of evolution, when the altruistic gene A survives. Theorem 9.2 of 
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Chapter II of [^s] shows that in the event that the process survives, it behaves in a rather 
regular fashion: as t becomes large, the vector N{t) tends to become a multiple of u, and to 
grow at rate p. (More precisely, the distribution of the random vector N{t)/p^ converges to 
Zu, where Z is a random variable.) Intuitively, this is a sort of law of large numbers: if the 
multitype branching process N{t) survives, the relative frequency of groups of each type 
tends to stabilize, as that given by the vector u, as N^{t) becomes large. But randomness 
in the values of Ni{t),..., Nn(t) persist (and are given by the one dimensional Z above), 
due to the randomness that affects the process in the first few generations, before large 
number phenomena can take place. 

We summarize now what we have learned about the evolution of the gene A in our 
framework. When (7) fails this gene dies out in a few generations. On the other hand, 
when (7) holds, the picture of its evolution is a dichotomy. Either A is eliminated in a few 
generations, or else it survives and, as it spreads, its distribution stabilizes dynamically, in 
the sense that 

N{t) = p'Zu, N^{t) = p'zY^k^k. (9) 

fc>i 

when 1 << t << log (7/ log p. We will refer to this time period as the stationary early stage, 
abbreviated S.E.S.. Note that the upper bound on the magnitude of t is equivalent to the 
condition that during this period N'^it) « g. The condition t « \ogg/\ogp is then 
what defines the E.S., and we will refer to the first few generations, before the S.E.S., as 
the very early stage, abbreviated V.E.S.. During the V.E.S., N{t) evolves very randomly, 
displaying little regularity, since the number of copies of the gene A is small. The random 
variable Z reflects how randomness during that period affects the later S.E.S., and the 
extent to which it is not washed out as the number of copies of A grows and the evolution 
becomes more regular. 

Finally, the later period when N^{t) is no longer negligible as compared to g (provided 
that A has survived) will be called late stage, abbreviated L.S.. The evolution of N{t) 
during the L.S. will no longer be well approximated by the multitype branching process, 
and is more challenging to study. To focus in the current paper only on the issue of survival 
of the gene A, we will postpone our analysis of that problem to a later publication ([60]). 
Here we only observe that in that regime, laws of large numbers allows us to well describe 
the evolution as a dynamical system (in n dimensions, representing the fractions of groups 
of each type). That dynamical system turns out to be non-linear (due to migration) and to 
sometimes have more than one stable equilibrium. Nevertheless, under the condition that 
(C2) holds and 5 > is small, or some alternative conditions (for instance (C3) in the next 
section), when m > is small, it has a single stable equilibrium, corresponding to fixation 
of the gene A in the population. This is what characterizes the critical point mj. It is 
worthwhile to stress in this connection that the linearity of the evolution during the E.S., 
as given by (5), in spite of migration, makes the issue of deciding when the mutant A can 
survive much easier than it would be otherwise. This is what allows the reduction of the 
problem to a standard eigenvalue problem. 

Returning to our analysis of the E.S., We can see (9) as a form of self-organization 
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of the gene A. If it survives, it arranges itself according to the distribution z/, that is 
a left-eigenvector of the driving matrix M{A + B). Left-eigenvectors are precisely the 
arrangements that the process can have which are preserved in time. To better appreciate 
what is special about u, and for several future uses, we observe that if u' is a left-eigenvector 
of M{A + B), with eigenvalue p', and no negative entries, then 

In other words, p' is the average fitness of the individuals who carry the gene A (or simply, 
the average fitness of the gene A), when the groups with altruists are distributed according 
to v' . Identity (10) is an easy consequence of two observations. First that from (5) we 
know that if N{t) = Cv' for some constant C, then IEN{t + 1) = Cp'u'. This implies that 
lEN^it + 1) = Cp' Ylik ^^'k- Second, that in the multitype branching process an individual 
who carries the gene A and belongs to a group with k altruists produces an expected number 
of offspring w^, so that we can also write IEN"^{t + l) = C w^kv'f^. Comparison of these 
expressions yields (10). 

We combine (7) with (10) to write the necessary and sufficient condition for viability 
of the gene A as 

=P > 1. (11) 

Since v is the only left-eigenvector of the maximal eigenvalue p, (9) is telling us that 
when the altruistic gene survives, it tends to organize itself (or we can also say "nature 
organizes it, through natural selection") in the stable way that maximizes its average fitness. 
This observation also makes the condition of survival (11) look particularly natural. If 
p < 1, there is no stable way for the gene A to be organized so that it will have an average 
fitness that is larger than that of the wild type N in the population at large, where A is 
still rare; A will then not be viable. On the other hand, when p > 1, the gene A can be 
organized according to z/, that is stable, and provides it with mean fitness larger than 1, as 
needed for it to spread among the wild type N. 

It is enlightening to see what happens when the inequality in (11) fails. Even in this case, 
chance may produce at a time t during the V.E.S. the arrangement N{t) = (0, 0, 0, 1), 
meaning that there are exactly n altruists, all in the same group. Condition (C2) tells us 
that at time t the average fitness of the gene A is larger than 1. And indeed, IEN^(t + 
1) = w^n > n = N'^it), so that the altruistic gene is spreading at this time. But this 
arrangement is not stable. In successive generations, the distribution of N{t) is driven to 
a combination of left-eigenvalues of M{A + B), and N"^{t) can grow then at most at rate 
p < 1, so that eventually it is eliminated. 

In contrast, when (11) holds, chance will dictate if during the V.E.S. an arrangement 
of copies of A will form that not only provides that gene with mean fitness larger than 1, 
but is also likely to produce a succession of arrangements in the next generations, all with 
this property. The important point is that, under (11), such arrangements do exist, and 
drive the evolution towards u. 
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The contrast in the last two paragraphs is one of the main lessons from our analysis. 
This lesson goes beyond the specific aspects of the stylization that we are adopting here. 
We suggest making this idea explicit as a guiding criterion, that should be of use when 
considering any framework, model or experimental situation. 

Viability or Survivability Criterion: In a large population, a single mutant gene A, in the 
absence of further mutations, will be viable, i.e., will have a positive probability of surviving 
and spreading, if and only if this gene can produce in a few generations an arrangement of 
its copies in a number that is still small compared to the size of the population, but is likely 
to produce in the next generations a sequence of arrangements with a growing number of 
its copies, until it accounts for a non-negligible fraction of the alleles in the population. 

We will refer to such a sequence of arrangements as a survival mechanism for the mutant 
gene A, so that the criterion stated above postulates the existence of a survival mechanism 

as a necessary and sufficient condition for the viability of the mutant gene A. Such a 
mechanism can, for instance, be started by an arrangement of copies of the gene A that 
satisfied the three conditions below: 

(i) When in this arrangement the average fitness of the mutant gene is larger than that 

of the wild type in the population at large, before the mutant appeared. 

(ii) This arrangement is likely to produce in the next generation another arrangement 
with the same property (i) above. 

(iii) Due to the growth in the number of copies of the gene, the probability of success in 
step (ii) above increases from generation to generation, fast enough to assure that 
the probability of producing the sequence of arrangements mentioned in the criterion 
is large. 

Indeed, in our framework, once an arrangement of copies of A is produced with distribution 
in groups close to i/, conditions (i), (ii), (iii) are fulfilled, provided that p > 1. On the other 
hand, when p < 1, no arrangement exists that satisfies these three conditions. 

We end this section with some observations on generalizations of our methods. One can 
modify the intergroup and the intragroup competition procedures, from the Fisher- Wright 
ones that we consider here, and in this way extend our framework further. For instance, 
modifications to the intragroup selection procedure can be fairly general, and would only 
require a modification of the matrix p{k, k'). Instances of such a modification could include 
domination patterns in the intragroup reproduction mechanism, that result in reproductive 
skew within the group. In an extreme case, a single member of the group, chosen at random 
with probabilities proportional to individual fitness of the group members, could mother 
all the n offspring of an offspring group. 

Modifications of the intergroup competition mechanism are even simpler to consider. 
Note that we did not use in our analysis the full power of the assumption that this mecha- 
nism is a Fisher- Wright procedure. We only assumed that if altruists are rare, then groups 
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with altruists father each in the next generation an almost independent random number of 
groups, with mean proportional to group fitness (defined as average fitness of group mem- 
bers). Under these broad assumptions our methods and results above, and in the remainder 
of this paper, are unchanged. We chose to introduce our framework with a Fisher- Wright 
competition mechanism among groups for concreteness. This choice forces the number of 
offspring groups of each group to be binomially distributed (well approximated by Poisson, 
since g is large). The observation in the current paragraph is of special relevance then in 
situations in which the number of groups fathered by each group is better modeled by a 
distribution that is far from Poisson, as for instance in cases in which their variance is much 
smaller than their mean (as happens, for example, when 5 is small, the mean is close to 1, 
and the variance is much smaller than 1, with most groups fathering exactly one group). 

2 Models 

In this section we will introduce several models and discuss their relevance. Fig. 2 provides 
an overview of some of their typical features. Fig. 1, Fig. 6 and Fig. 7 provide values of 
m<j as function of the strength of selection 5 for some of them. Notice, from these figures, 
that iris is not always monotone in 5, but that it often increases substantially when 5 is 
large. This fact highlights the relevance of studying the models not only when selection 
is weak. Notice also that in Fig. 6 and Fig. 7, the product nm^ can be of the order of 
10. This is relevant in view of the widespread claim that altruism cannot survive when 
nm is significantly larger than 1. As far as we know, this perception resulted from the 
analysis of particular models (e.g., in [13], [2] and [10]) and an excessive emphasis on the 
public goods game (Example 1, below). One of the main messages from this section and 
the following ones will, indeed, be that mechanisms that go beyond the public goods game 
may be central to the understanding of the spread of altruistic genes, and can be analyzed 
in our framework with no special difficulty. 

Conditions (CI) and (C2) are very mild, and basically characterize the effect of the gene 
A on its carriers as an individually beneficial social effect which comes with a personal cost 
to them. One does not have to restrict oneself to behavioral effects of the gene A on 
its carriers phenotype. For example, another kind of application could include anatomic 
and physiologic effects that carry a cost, but produce benefits to those with the altered 
phenotype when in groups with others that share this feature. For instance, gene A could 
promote changes that facilitate verbal communication with others that have the same 
changes, but at a cost, say, in adding expensive tissue to the brain. An isolated carrier of 
gene A would suffer the costs of carrying it, but without the possibility of benefiting from 
its potential advantages. 

In order to assure > 0, in a forthcoming paper ([60]), we will either have to assume 
that in addition to (C2) holding, 5 is small, or else we will need to add an additional 
assumption. A sufficient one will be: 

(C3) = Vn> Vk, i.e., = Wn > for k = 0, 1, n — 1, so that the average fitness 
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of a group is maximized when the group contains only altruists. 

When the gene A affects behavior, what precise conditions on the fitnesses and 
should be required for this gene to be called "altruistic"? There is no agreement on the 
answer. The issues are very nicely presented and discussed in [ 5'^]. We list next a few of 
the conditions that can naturally be associated with altruism. These ones, and a few more 
can be found in [ ; ], where detailed references and credit are given. 

Some of the conditions require the altruistic behavior to be beneficial. Typical condi- 
tions of this kind are: 

(C4) v^, or equivalently, w^, is increasing in A; = l,...,n, so that altruists are always 
better off sharing their group with more altruists. 

(C5) , or equivalently, , is increasing in A; = 0, ...,n — 1, so that non-altruists are 
always better off sharing their group with more altruists. 

(C6) Vk, or equivalently, Wk, is increasing in = 0, ...,n, so that the members of a group 
are in the average better off with more altruists in the group. (Note that (C6) is an 
extension of (C3).) 

And complementary conditions require the altruistic behavior to come at a cost to the 
actor: 

(C7) < , i.e., < , k = 1, ...,n — 1, so that altruists are always worse off than 
non-altruists in the same group. 

(C8) v^^-^ < , i.e., < , k = 0, ...,n — 1, so that an individual that suffered a 

mutation from N to A, would be worse off. 

Condition (C8) extends condition (CI). It is known that when it holds, then in the trait- 
group framework, m = 1, starting from any fraction p < 1 of genes A in the population, 
these genes will be eliminated. ([37] attributes this result to [ ].) The assortment provided 
by a low m is nevertheless sufficient to allow a single altruistic gene A to invade, if condition 
(C2) holds. 

We will illustrate the use of our framework with several examples (see Fig. 2): 
Example 1. Public goods game: 

= -C+{k-l)B/{n-l), 
= kB/{n-l), 

for positive constants C and B. One can think that at a cost C to itself, each altruist 
provides a benefit B/{n — 1) to each one of the other members of its group. Alternatively, 
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one can think that at a cost c to itself, each ahruist provides a benefit h/n to each member 
of its group, itself included. Set then C = c — h/n, for the net cost to the altruist, and 
B = b{n — l)/n, for the total benefit to the other members of the group. Each one of these 
two descriptions is common in nature, has its theoretical advantages and both appear often 
in the literature (see [55] for more on this point). The former description is often referred 
to as "other-only" trait, and the latter one is then referred to as "whole-group" trait. Their 
mathematical equivalence illustrates something that presents itself a number of times. Two 
models may be different in relevant biologic aspects, but lead to the same functions and 

, possibly after some change of variables, as above. In this case, we will say that the 
models are materially different, but formally equivalent. 

There is a second way in which the present example splits into two materially different, 
but formally equivalent descriptions. On one hand, altruists could be performing individual 
actions, producing identical benefits to all the other (or to all, self included) members of 
the group. In some applications this may be a good description of what is happening. For 
instance, altruists could be individuals with a hygiene habit that is beneficial to the group, 
in preventing disease, but costly to the actor. Alarm calls are another example. 

On the other hand, the fitness functions in this example can also accommodate collective 
actions, in which altruists act together, to produce a common good for the group. Fighting 
in a war against another group, or participating in collective hunting activities (with the 
product of the hunt shared among all members of the group) would be examples of this 
kind. 

In all cases, the assumptions that the total benefit produced, Bk, grows linearly with 
the number k of actors, and the net cost to each actor, C, is constant, may be unrealistic. 
We will explore these points in Example 5, below. 

Note that Vk = k{B — C)/n. Condition (CI) holds since = —C < 0. We suppose 
that C < B,so that (C2) holds, since = B-C. Note that then also (C3)-(C8) all hold. 

Numerical results for Example 1 appear in Fig. 1, Fig. 5 and Fig. 8. 

The public goods game has rightfully been called "the mother of all cooperative models" 
(see footnote 1 in [70]). It is natural to study its behavior, and to understand how the gene 
A can spread in this case. In the next examples we will nevertheless try to convey the 
message that one should aim at developing methods, as we do here, that can address more 
general models, as well. In the spirit of that metaphor, it is natural to see the next example 
as "a special daughter of the public goods game. It derives from the public goods game in 
the same way that (in a two player setting) the iterated prisoner dilemma and the tit-for-tat 
strategy derive from a one shot prisoner dilemma game and a simple cooperative strategy. 

Example 2. Iterated public goods game. Altruists cooperate conditionally: 

-C + {k- l)B/{n - 1), if < a, 
T{-C + {k-l)B/{n-l)), if k>a, 

kB/{n - 1), if k<a, 
TkB/{n-l), if k > a, 
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for positive constants C and B, T > 1 and a G {1,2, ...,n — 1}. Here we suppose that a 
public goods game is repeated a random number of times r > 1, with average -ZE(r) = T. 
Each time each member of the group can cooperate at a cost C to itself, resulting in a benefit 
B/{n ~ 1) to each one of the other members of its group. Defectors incur no costs and 
produce no benefits. We suppose that altruists cooperate in the first round, and afterwards 
only cooperate if at least a other members of the group cooperated in the previous round. 
This is a generalization of the well known tit-for-tat strategy, which corresponds to the 
case n = 2, a = 1. We will refer to the strategy of the altruists in this example then as 
"many-individuals-tit-for-tat (with threshold a)". 

Note that when T = 1, regardless of the value of a, this example is identical to Example 
1. Note also that Vk = k{B — C)/n ii k < a, and Vk = Tk{B — C)/n if k > a. Again, 
we suppose that < C < B, and so both, (CI) and (C2) hold, since = —C and 

= {B - C)T. It is also easy to see that then (C3), (C5), (C6) and (C7) hold. 

Condition (C4) will only hold under additional assumptions. A very natural one, that 
we will assume, unless stated otherwise, is that the threshold a satisfies 

-C + aB/{n-l) > 0, (12) 

i.e., when altruists keep playing the game, it is never in their disadvantage to do so. 

It is instructive to look into what happens with (C8) in detail: f^^^ ~ '^k — < 0' 
if A; < a — 1; v^_^_i ~ '^k = —CT < if A; > a; but in the case k = a, v^_^i ~ = 
T{—C + aB/{n — 1)) — aB/{n — 1). Therefore, if (12) holds as a strict inequality, then 
v^j^i — > 0, for large T, and (C8) fails. But if (12) fails, or holds as an equality, then 
(C8) holds, for arbitrary T. Note that if (12) holds as an equality (which can only occur if 
{n — 1)C/B is an integer), then all the conditions (C1)-(C8) are satisfied. 

This model was studied independently in [7] and in [32], in the trait-group framework. 
Both papers identified stable equilibria with positive fractions of altruists (a phenomenon 
that can occur only when (C8) fails). But they also observed that altruists could not invade 
when rare (a phenomenon that always holds under (CI)). In [ ] an approach was then 
introduced to provide assortment and allow the gene A to invade when rare. The authors 
concluded that such invasion by gene A could only occur under very restrictive conditions, 
and that therefore this model and the corresponding notion of many-individuals-tit-for-tat 
were of marginal relevance. One of our contributions in the current paper is to rectify this 
perception. In our framework, we obtain values on large enough to indicate that this 
model should be seriously considered as a possible mechanism for the spread of altruistic 
genes (see Fig. 6 and Fig 15). Indeed we will see that the estimates in [7] contained 
an unreasonably pessimistic assumption, that is not supported in our framework (see last 
paragraph in Section 5). 

From a theoretical point of view, this model is still mathematically simple enough to 
lead to an interesting detailed analysis of the conditions under which the gene A can spread, 
in case selection is weak and the group size n is large, with the threshold a proportional 
to n. This analysis (illustrated in Fig. 16, Fig. 17, Fig. 18, Fig. 19 and Fig. 20) will be a 
good illustration of the simplifications that will be obtained, in Section 5, in that regime. 
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We see this example as the prototype for an important class of models, that we make 
explicit in Example 6 below, and that we believe should be seriously considered and studied. 

Example 3. Threshold model: 

-C, if k<e, 
-C + A, if > 9, 

0, if k<e, 
A', if k>e, 

for positive constants C, A and A', and an integer 6 G {1, 2, ...,n}. The idea here is simple: 
the gene A carries a cost, but allows its carriers to gain benefits if sufficiently many are 
in the group. Non-altruists obtain benefits also when altruists do, but we allowed for the 
possibility that those are smaller or larger than those of the altruists. 

This model may be seen as a simplification of Example 2. It shares with it the features 
that when few altruists are present, they incur costs, but when in numbers larger than a 
threshold, have positive payoffs that may be much larger than those costs. The fitnesses in 
this model are sufficiently simpler than those in Example 2, to allow for a more transparent 
analysis of its behavior. We will see that this model is of great value when we discuss 
conceptual issues, including Hamilton's rule. In [70], the case n = 3 of this model (called 
there "stag hunt game" ) was discussed in connection to the conceptual issue of the role of 
Hamilton's rule. This raised a debate in [11] and [71] and further analysis in [20]. We will 
comment on this at the end of Section 3. 

Example 3 is also of great value for comparison purposes, providing meaningful bounds 
on the behavior of more elaborate and realistic models. For instance, the very elaborate 
fitness functions studied in [(>] are well approximated by those in Example 3. We are 
currently reanalyzing the work and ideas from [; ] in the context of our framework, and 
taking advantage of this relationship in that project ([ ]). 

This model can also be seen as a simple instance of another natural class of models that 
we introduce below, in Example 5. 

If 6' = 1, then either (CI) or (C2) is violated, since then = = —C + A. So we 
suppose that 9 >2. Under this assumption, (CI) is immediate from C > 0, and we suppose 
that C < A, so that (C2) is also satisfied. Conditions (C4) and (C5) are clearly satisfied 
then. Condition (C7) will be satisfied in case A — C < A'. We have Vk = —Ck/n, if k < 6, 
and Vk = A' + (— C + A — A')k/n, ii k > 6. So (C3) (and therefore also (C6)) may not be 
satisfied. (C3) holds, nevertheless, ii A' < A — C. (But even under this assumption, (C6) 
fails, unless 6 = 2.) As for (C8), it fails, regardless of the value of A', in the same fashion 
that it failed (in general) in Example 2, since — v^_i = —C + A > 0. 

Numerical results for Example 3 appear in Fig. 7, Fig. 8 and Fig. 11). As with Example 
2, Example 3 also nice illustrates the simplifications that will be obtained, in Section 5, 
when 6 is small and n is large (see (56) and (57)). 
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Example 4. Additive pairwise interactions (general linear fitness functions): 

Vk = aA + {k — l)aAA + {n - k)aAN = -C + {k - l)B/{n — 1) = di + ^2^, 
Vk — ttN + kaNA + {n — k — l)aNN — kB'/{n — l) — d^k. 

Here we suppose that members of the group interact in a pairwise manner throughout their 
hves. Each such pair interaction contributes a certain amount to the total payoff of each 
one of the two individuals. The contribution from each pairwise interaction to each one of 
the two participants depends only on their types. The payoff to a type i interacting with a 
type j will be denoted ajj, i,j = A,N. In addition, each individual has a self contribution, 
Qi to its payoff that depends only on its type i — A,N. These contributions are added to 
produce the final payoff. The result is displayed above, and then rewritten in terms of 
C = —aA — {n — l)aAN, B/{n — 1) = aAA — clan-i B'/{n — 1) = qna — o,nn, where we 
incorporated the assumption that 

+ {n — l)aNN = 0. 

This assumption carries no loss of generality, since a constant can be added to all the payoffs 

and without modifying the behavior of our process. (The behavior of the process is 
clearly not modified by multiplying all fitnesses, — 1 + 5v^, = 1 + 5v^ , by the same 
constant. If we add a constant v to all the payoff functions w^, , the new fitnesses are 
equivalent in this sense to the old fitnesses with 5 replaced by 5/(1 + 5v).) This condition 
amounts simply to our convention that Vq = Expressing the fitness functions in terms 
of C, B and B' , makes their relationship with Example 1 and Example 5 below easy to see. 
Finally the result is also rewritten in an equivalent form that emphasizes the nature of the 
dependence of the fitnesses on k: they are linear functions. Here d\ — —C — B/(n — 1), 
4 = B/{n - 1) and = B'/{n - 1). 

An important point to make is that given linear fitness functions and , with 
Vq = 0, as above, they can always be represented in the other two ways, with an appropriate 
choice of the constants. For instance, we can take C = —di — d2, B/{n — 1) — d2 and 
B'/{n — 1) = ds, and then take qa — — qnn — 0, qaa — {B — C)/{n — 1), aAN — 
-C/(n-l), aNA = B'/{n-l). 

When n = 2, the current example is the most general possible choice of the payoff 
functions and . But this is obviously not the case when n > 3. For each value of n, 
the most general form of the payoff functions are polynomials of degree n — 1. 

The linearity of the functions and will play important roles in relating our results 
to other concepts, especially Hamilton's rule. This is one of the reasons this is a major class 
of models. The mathematical equivalence between these linearities and having pairwise 
additive interactions should not confuse one into thinking that the linearities imply that 
the fitnesses must indeed have originated from that special kind of interaction. In Example 
1, the members of a group may be interacting in a collective way (hunting together, warfare, 
etc). All that the mathematical equivalence says is that the fitnesses obtained there, are 
the same ones of a, ficticious in this case, pairwise interaction scenario. This is a good 
illustration of two formally equivalent, but materially different models. 
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There are realistic stories that are conceptually associated to the public goods game, 
Example 1, but lead to the more general payoff in the current example, with B' ^ B. For 
instance, the altruistic activity could be hunting more dangerous but also more nutritional 
prey. If the products of the hunts are always shared by the group, we have the model in 
Example 1. But if the hunters are able to consume the best part of the hunt, before sharing 
the rest with the group, we would have B' < B. 

While materially different from the public goods game. Example 1, the current example 
is formally equivalent to it when the following equivalent conditions hold: 

B = B', ttAA — dAN = dNA — a-NN, ^AA — ^NA = O-AN — ^NN- (13) 

Condition (13) is know as "equal gains from switching", since the payoffs in the 2x2 
matrix Ojj, j = A, N, change by the same amounts if one switches strategies, regardless 
of what the other player is doing. Unfortunately the terminology "additivity condition" or 
"linearity condition" is also used in the literature for (13). This is confusing, since in our 
context, additivity refers to the fact that the payoffs and are obtained additively 
over the k — 1 pairwise interactions that each individual has with the other members of its 
group. This additivity has no relationship with (13). And in our context, linearity, refers to 
the linearity of and as functions of k. As we explained above, in the mathematically 
standard way in which we are using the terms pairwise additivity and linearity, they are 
equivalent to each other, and logically independent of (13). 

Under condition (13), it is common to use the representation ajsiN = 0, qaa = —c + b, 
a AN = — c, a^A = b. When b > c, this is a classical prisoner's dilemma. It corresponds to 
C = {n- l)c, B = B' = {n- 1)6. 

The matrix represents the lifetime payoff for each pairwise interaction. This lifetime 
payoff may result from the accumulation of payoffs from iterated games. In this way we 
can see that the setup in this example is flexible enough to accommodate a gene A that 
produces a conditional behavior over such iterated games, like, for instance a tit-for-tat 
strategy. For this, suppose that each pair of individuals interact repeatedly with payoffs 
given by the standard prisoner's dilemma matrix. If type N always defects, and type A 
uses a tit-for-tat strategy, we have a^N = 0, aAA = (— c -|- b) T, qan = —c, a^A = b, where 
T is the average number of repetitions of the basic interaction over a lifetime. If T > 1, 
(13) fails, and we have 

y^ = -(n-l)c+{{b-c)T + c){k-l), = bk. (14) 

It is common to write B — B' = D and call it a "synergy" term. It represents an 
additional benefit (possibly negative) to altruists when interacting with other altruists. 
In the iterated pairwise prisoner dilemma game with A playing tit-for-tat, (14), we have 
D = {b-c){T -l){n-l). 

Deciding when each one of the conditions (C1)-(C8) holds in the current example is 
tedious and not so relevant. We observe only a few facts. Assuming < C < Bs (CI), 
(C2) and (C4), that only depend on v^. If B' > 0, then also (C5) holds. The other 
conditions depend on how B' relates to B and C. We just make the simple remark that if 
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< C < i? and B' is close enough to 5, then all the conditions (C1)-(C8) hold, since they 
hold with slack when < C < B = B' (Example 1). 

In Example 1, we observed that the assumed linearity of and there may not 
be realistic. The same observation holds about pairwise additivity of interactions. Many 
interactions in a group are between pairs, but it is not always clear that their effects on 
fitness should be additive. When an individual interacts repeatedly with another member 
of the group, like in the story that lead to (14), it may not be able to interact as often with 
the other members of the group. Also, the beneficial effects of the pairwise interactions 
may saturate, and be sub-additive, rather than additive. Additivity /linearity is mathe- 
matically a natural first level simplification/approximation. But one should be aware of its 
limitations. With this in mind, we turn to the next example. 

Example 5. Variable costs and benefits: 

= -Ck + {k-l)Bk/{n-l), 
Vk = kB'J{n-l\ 

Remarks in Example 1 and 4, above, motivate this class of of examples. Without further 
assumptions on the costs and benefits functions, Ck, Bk and B[, any model can be fit into 
this form. So that what we are proposing here is first a convenient notation for comparing 
models. Next we discuss some interesting assumptions on the costs and benefits functions. 

It is very natural, in various applications, to assume that Ck is non-increasing and B^ 
and B'l^ are non-decreasing. If the gene A prompts its carriers to act in some collective 
way, it is often the case that the cost to each participant decreases with the number of 
participants, while the total benefits produced grow faster than linearly with the number 
of participants. This is called an increasing return to scale. Reasonable assumptions can be 
that Ck decrease as a power of k, Ck = C/k'^^, for some constant ai > 0, while Bk = a2k°'^ 
and B'f, = a4k"''\ with 02 > 0, < 03 < 1, 04 > 0, < 05 < 1. 

Another distinct assumption on the benefit functions is that {k — l)Bk/{n — 1) and 
kB[/{n — 1) first grow slowly with k, then steeply (close to a threshold value of k) and 
then more slowly again, as the gains from scale saturate. For instance this will happen if 
Bk = hk/(l + dk"^), B'l^ = b'k/{l + d'k"^), with positive constants b, d, b', d'. 

An interesting class of models covered by the current example is the object of [29]. 
Experimental results with microbes often indicate the need to consider non-linear payoff 
functions, as those discussed in the current example; see, for instance [8] and [63]. 

In case of a collective action that requires a minimum number 6 of participants, we 
should have Bk = B'g^ = 0, for k < 6. And for larger values of k, Bk and B^. should 
grow, but again typically not linearly. For instance, gene A could promote a behavior that 
can only be implemented in groups of at least 4 individuals, say. This could be a type 
of large game hunt, that requires 4 hunters. Gene A causes changes to the individual's 
phenotype that make this kind of hunt possible, but at the expense of adding expensive 
muscular and/or brain tissue. Types N just hunt individually small game. We suppose 
that the hunters share their product with the group, and that large game produces greater 



19 



benefits per person in the group, than small game hunt. How will -B^ and -B^ grow when 
k > Al The answer will depend on ecological conditions, and detailed aspects of the 
hunting technique. Can several different groups hunt simultaneously? Would the hunt be 
more efficient with 7 hunters than with 4? If most group members are hunting large game, 
would the productivity of small game hunt increase so as to make it advantageous for the 
group to combine both types of hunt? And so on. 

Example 6. Iterated game. Altruists cooperate conditionally, based on feedback: 

vt = r,(-Cfe + (A;-l)5fe/(n-l)), 
= T,kB'J{n-l). 

Here can be seen as an average number of repetitions of a basic activity. This class of 
models builds on the models in Example 5 in a fashion that generalizes the way Example 2 
was built on Example 1. We are supposing that a certain activity presets itself to the group 
periodically. The output each time depends on the behavior of the group members, and 
gene A modifies this behavior. Carriers of gene A behave first in a way that is beneficial 
to the group. And afterwards, they will or not continue acting in this way, depending on 
feedback that they receive. For instance, if the activity is a type of collective hunt, they 
will have feedback as they consume the product of the hunt. In Example 2, the feedback 
was a count of the number of participants, and this is also a possibility, but not the only 
one. In Example 2, was a step function, jumping from 1 to T, at /c = a + 1. But it seems 
also natural to consider smoother functions T^, that increase first slowly, then steeply, and 
then saturate. This could result from the fact that the feedback, from each repetition of 
the activity, is subject to random noise, and only gives clear cues to the altruists, outside 
of a critical window of values of k. For values of k in the critical window, altruists may 
repeat the activity a few times, before deciding to stop participating. 

Models represented in Example 5, with non-increasing and non-decreasing i?^, have 
a natural threshold value of fc, where their payoff to altruists becomes positive. In Example 
1, this threshold value A; = a + 1, corresponds to the condition (12) that appears then in 
Example 2. If the feedback that affects the willingness of altruist to continue participating 
in the activities promotes a function that (as in Example 2, with assumption (12)) starts 
growing above that threshold, the resulting behavior becomes more adaptive than it would 
be without feedback effect. 

Mathematically the models in the current example can be incorporated in Example 
5, by modifying the definition of C^, Bk and B'^,. But the point of the current example 
is to provide justification for, and a mechanism behind, a class of models in which as k 
increases, switches from small negative values to much larger positive values after a 
threshold value of k is crossed. These models can be approximated by and/or compared 
to the simple case given in Example 3. It then becomes natural to analyze that model 
with ratios A/C which can be as large as 100, or 1000. For instance, suppose the species 
under consideration to be early humans, with an adult reproductive lifespan of over 20 
years. If the activity in consideration is repeated with a frequency of 50 per year, and if 
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with negative feedback altruists stop participating after about 10 repetitions, then we can 
consider a factor of 50 x 20/10 = 100 between with large k and Tk with small k. 

Obviously the perceived feedback cannot be the payoff in the model, which is related to 
expected number of offspring in the future. But it is sufficient that the feedback be strongly 
correlated with this payoff. This is not a special problem about altruism, when behavior 
is mediated by feedback, natural selection will align reaction to feedback with fitness. 

The combination of increasing returns to scale, with the possibility of only pursuing the 
behavior when it is advantageous to the actors, due to a high level of collective participation, 
may have been a powerful set of mechanisms that led to the evolution of altruistic and 
cooperative social traits. This idea has, for instance, been explored in [G], where a model 
of that nature was analyzed in connection with the trait of punishing, at a cost, those 
that do not cooperate with the group. In that model, types A produce first a costly 
signal announcing that they are willing to participate in costly punishment. But they only 
implement the punishment if a sufficiently large number of group members signaled their 
willingness of doing it too. 

Another story that fits with increased returns to scale, modified by feedback induced 
discontinuation, can be suggested for the emergence of "compassionate feelings" for group 
members. A gene that promotes those emotions, would cause their carriers to help fellow 
group members in need, at a cost to themselves. While there are few group members 
carrying this gene, they will not be able to do much for all those that may need help, 
in a large group. Feedback, in the form of frustration for not being able to help as they 
want, may lead them to discontinue their helping activities, in a short while. But with 
enough carriers of that gene in a group, they become able to successfully provide help to 
all that need it. Under this condition they pursue this activity, throught their lives. At 
the same time, under this conditions, the altruists themselves have their fitnesses increased 
substantially, from living in a helping environment. This suggests that this story may be 
represented by a payoff function of the type that we are introducing in the current example, 
with Tk that can reasonably be taken to vary by a factor of 100 or more, as a threshold 
window k is crossed. 

3 Conceptual discussion 

We start next on a long conceptual discussion of our results, and how they may help clarify 
some controversial issues related to the emergence of altruism through natural selection. 

First we consider how our results and the underlying mechanisms revealed by them 
relate to "group selection", "multi-level selection" and "kin selection". This is not the 
place, and neither do we have the expertise to discuss in detail the various nuances of the 
semantics involved in these questions. But a few words are in order, and should be of value 
to the readers. 

The use of the expression "group selection" has changed over the years, and is still 
somewhat controversial. Nevertheless it seems to us that under any reasonable use of 
that expression, group selection is an important force operating in our setting. In our 
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framework, individuals belong to groups, groups compete among themselves, and in this 
way the fitness of individuals depends on the constitution of their groups. In our typical 
examples, the fitness of each altruist is strictly smaller than the fitness of the non-altruists 
in its group (Condition (C7)). It is only through the higher fitness of groups with many 
altruists, that the altruistic gene A is then able to survive and spread. Individuals also 
compete for reproduction with other members of their own group, so that in our setting 
group selection is one of the components of a "multi-level selection" process. 

Is the mechanism of kin selection present when the mutant A spreads? We understand 
kin selection as a process in which copies of a gene, originating from a recent common 
ancestor, interact with each other providing themselves with an average fitness large enough 
for this gene to survive and spread. This is precisely what happens in our case, and 
is also what we tried to capture in more detail in the viability criterion and associated 
survival mechanism presented at the end of Section 1. In other cases, the assortment and 
organization of the genes could be caused by kin recognition. In our case it is caused by 
viscosity, that results from the group structure of the population and limited migration. In 
a model in which an isolated mutant has relative fitness lower than the wild type (condition 
(CI) in our case), its only hope for survival is in the creation by chance of a few of its copies, 
that happen to be so arranged that they have higher average fitness than the wild type in 
the population at large, and for the structure of the population to be such that this gene 
can then spread by what we called survival mechanism. 

In this connection, condition (11) can be seen as providing a "gene's eye view" of 
viability in our setting. As noted above, the left hand side in this condition is simply the 
average fitness of the gene, when it is arranged in the best possible stable way, to assure its 
spreading. It is common to refer to the average fitness of a gene A as its neighbor modulated 
fitness. Under certain conditions it is known that the neighbor modulated fitness can be 
computed also as an "inclusive fitness" , in which one adds the effects of a randomly selected 
gene A from the population on the other genes A. At this point, we are not sure how far 
this method can be extended. (See [21] for a general investigation of this question.) We 
will address later in this paper the issue of the validity of the related Hamilton rule in 
our setting, but we will defer the answer to the question of when the neighbor modulated 
fitness of the gene A can also be computed as an inclusive fitness to a later investigation. 

It is worth also clarifying that in our view multi-level selection and kin selection are 
not the same concept, even if both are central to our study. We see multi- level selection 
as a process in which the demographics and/or the biology (including behavior) associates 
individuals to groups in such a way that the reproductive success of each individual depends 
on the composition of its group. Kin selection can happen, as in our framework, in a multi- 
level selection setting. But it can also happen in populations that are organized in other 
ways, in which individuals are not sorted into groups. Moreover, in our framework, in the 
late stage, when types A exist in numbers comparable with g, multi-level selection will 
continue to be a basic driving force acting on the population. But the average fitness of 
types A will no longer result only from their interaction with other types A that are close 
kin. Whether one should still refer to kin selection as an important force then is an issue 
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that we postpone to the paper in preparation in which we study the late stage ([!-]). 

In the way that we use the expressions "multi-level selection" and "kin selection" in the 
discussion above, they are qualitative concepts, rather than computational or accounting 
procedures. These concepts are nevertheless sometimes associated to certain computa- 
tional procedures: Multi-level selection is sometimes associated to the Price equation, and 
kin selection is often associated to either neighbor modulated fitness, or inclusive fitness 
computations. But specially in an area in which semantic issues are a source of difficulties, 
one should carefully separate concepts from computational procedures. As we explained 
above, while the concept of neighbor modulated fitness fits easily into our framework, it is 
currently not clear to us that the concept of inclusive fitness could fit as well. The Price 
equation obviously applies in our framework, since it requires minimal conditions, and is 
a natural, mathematically rigorous, tool to consider when studying group-structured pop- 
ulations. We will elaborate below on what it adds to the solution of the specific problems 
that we are addressing in this paper, and why we did not use it in our analysis in Section 
1. 

To decide on the viability of the mutant A in our framework, we see no simpler math- 
ematical method (when selection is strong - we will study the important simplifications in 
case of weak selection later in this paper) than computing p or v. This does not mean that 
alternative or complementary methods cannot add insights, intuition and relevant infor- 
mation. Moreover, it is important to understand how different approaches relate to each 
other, and computational parameters relate to experimentally accessible variables. With 
this in mind, we turn now to a discussion of how our mathematical approach compares to 
the use of the Price equation, and to the role of Hamilton's rule in our setting. 

Before proceeding we need to introduce some more notation: 

N,{t) = g - {N,it) + ... + iV„(t)), fit) = iNo{t)/g, N,{t)/g, N^{t)/g), 
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This means that f(t) is the distribution of the various types of groups, including groups of 
type 0, in generation t, and p{t) is the fraction of altruists in the population then. 

The Price equation provides the expected value of p{t + 1), when f{t) is given. It can 
be stated in several mathematically equivalent versions. In our setting the simplest one is 

mpit+i)\m) = Pit) wlfi^t^y (15) 

where 

W{f) = J2w,f,, W^{f) = ^^^, 

are, respectively, the average fitness of the individuals in the population and the average 
fitness of the altruists in the population, when the distribution of the group types is given 
by / = ifo, fi, fn)- Equation (15) is an immediate consequence of the fact that each 



23 



individual has an expected number of offspring proportional to its relative fitness. By 
adding and subtracting terms, it can be rewritten as 

w{fit))iEipit+i) - pmit)) = pmw^{fit))-w{fm 

= Y.M-vo,){k/n)h{t) + Y.k{y^k-W{f{tmkln)h{t) 
= lE{{w^-WK){K/n)) + CoviwK,K/n), 

(16) 

where i^' is a random variable with IP{K = k) = fk{t). In the E.S., since the fraction of 
altruists is negligible in the large population, we have in good approximation W{f{t)) = 1, 
reducing (15) to 

iE(p(t + l)|/(t)) = p{t)W\f{t)). (17) 

Can (15) or the equivalent (16), or the simplified (17) be used to predict when altruism 
can survive in our setting? The answer is negative. In generation t = we have /(O) = 
(1 — 1/(7, 1/(7, 0, 0, 0), and the average fitness of A, which is simply the fitness of the 
only A present, is W^{f{0)) = < 1, under condition (CI). Therefore (17) leads to 
lEpil) = w^/g < 1/g = p(0) for any value of m, as we already new. If one wants to iterate 
(17) over time, in order to learn under what conditions IEp[t) eventually increases and does 
not vanish, one needs to be able to compute /(I), /(2), ... etc. The well known problem is 
that (17) does not provide information about /(t + 1) given f{t). Actually (17) carries no 
information at all about m and how the groups are formed in generation t + 1. 

The Price equation in its various forms is a useful tool for many purposes. When the 
right hand side is split, as in (16), into two terms that correspond respectively to intergroup 
and intragroup competition, it carries great heuristic power, and beauty. We hope that, 
nevertheless, our current study may help clarify some of its limitations in the analysis of 
evolution in group structured populations. It is interesting to look into what (17) tells 
us in combination with what we already know about the evolution of our population. 
When the gene A survives, in the S.E.S., (9) implies that {fi{t), f2(t), fn(t)) = Cit)^, 
where C{t) is time dependent and random, but one-dimensional. Therefore W^{f{t)) = 
(Y^k'^k^^k) I i^k^^k) = p, thanks to (10). We obtain now from (17) 

iE(p(t + l)|/(t)) = ppit). 

This is compatible with the growth rate given by (6) and (9), and is not new information 
to us, but the consistency is reassuring. When p > 1, IE{p(t)) first decreases, but then 
eventually increases, reaching growth rate p in the S.E.S.. As explained above, in this 
case either the gene A dies out early on, or else, natural selection organizes it in groups 
according to the distribution z/, that is stationary for the evolution driven by M{A + B) and 
maximizes the rate of growth of p{t) among all such stationary distributions. The power of 
(5) over (17), is that it provides the evolution of the whole distribution over group types, 
not just the fraction of altruists in the population. 

Next we address the question of whether Hamilton's rule applies in our setting, to pro- 
vide the condition under which altruism, and other genetically determined behaviors that 
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are costly to the actor, can spread. The answer here greatly depends on what one means 
by "Hamilton's rule" . In its broadest sense, it is natural to use this title for any inequality 
that is a necessary and sufficient condition for altruism to have a positive probability of 
spreading, in other words, for any "viability condition". With this interpretation, (11) is 
a generalized Hamilton rule that is universal in our framework, when started with a single 
altruist. In this paper we will use the terminology "generalized Hamilton rule", or "gener- 
alized Hamilton condition" , and "viability condition" with equivalent meanings (some may 
prefer "survivability condition"). We hope that rather than creating confusion, this usage 
will shift the discussion from a semantic matter into questions with scientific content: How 
does the viability condition (11) simplify in special cases? Can it be formulated in terms of 
concepts of relatedness, costs and beneffis? Can it be formulated in terms of experimentally 
meaningful variables? Can it be written in ways that help compare different models? 

We should clarify that here we are considering the question of when the altruistic gene 
A is viable, starting from a single copy of it. Conditions for selection to favor gene A when 
the process is started with a number of copies of A comparable to g are a different problem, 
that we will address when studying the evolution in our framework in the late stages ([60]). 
Here we will only address this further question in the special important case of pairwise 
additive interactions. Example 4, so as to illustrate how things can change when the gene 
A becomes common in the population. 

Before we can analyze how (11) compares in spirit and content with more traditional 
forms of Hamilton's rule, we need to introduce and review several additional concepts. 

Suppose that in generation t, a group is chosen at random, and its members are ordered 
in a random fashion. The chosen group is called the focal group, the ffist individual in the 
ordering is called the focal individual and the second individual in the ordering is called 
the co-focal individual. Note that this random experiment is equivalent to choosing a focal 
individual at random and then choosing a co-focal individual at random from the other 
n — 1 individuals in the focal's group. We will denote by IPt probabilities that refer to 
the sampling just described, and use lEt for corresponding expected values. We denote by 
Aj the event that the ith individual in the random ordering of the members of the focal 
group is a type A. And we denote by Ij the indicator of the event Aj, i.e., Ij is the random 
variable that takes value 1 if Aj occurs and value if its complement, A^, occurs. We 
denote hj K = Ii + ... + 1^ the random number of types A in the focal group. And we set 
p = K/n, for the random fraction of altruists in the focal group. 

Clearly IPt{Aj) = p{t) does not depend on j and is the fraction of altruists in generation 
t. Linearity of expectations yields the following relationships, that will be of great use: 

lEtiK) = IEt{h) + ... + lEt{Q = nlPt{A,) = np{t), IEt{p)=p{t). (18) 
IEt{K-l\Ai) = IEt{h\A^) + ... + IEt{In\A^) = {n - l)IPt{A2\A^). (19) 
IEt{K\A'i) = IEt{h\Al) + ... + IEt{In\Al) = {n - l)IPt{A2\A'i). (20) 
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For t in the S.E.S., we have the following fundamental relationship 

IP,{K = k\A,) = = IP^es{K = k), (21) 

for A; > 1, where the second equality introduces a new notation. To see this, first note that 
conditioning on Ai implies that the gene A has survived into the S.E.S., and therefore (9) 
holds. The sampling, at time t, is therefore of a population that has mostly groups with 
no altruists, but has also a large number (of order p*) of groups with altruists, distributed 
according to u. If the conditioning was on K > 1, rather than on Ai, the conditional 
probability would be IPt{K = k\K > 1) = Uk. We would be just sampling unbiasedly from 
the groups with altruists. But conditioning on Ai introduces size bias: For A; > 1, 



IPt{K = k\Ai] 



IPt{Ai\ 


\K = k) Ft{K = k) 




\K = k') IPt{K = k') 



{k/n)vkIPt{K>l) kuk 



J2,,{k'/n)uk'IPt{K>l) Y^k'k'^k'' 
Using (21), we can now rewrite our viability, or generalized Hamilton rule, (11) as 



,<S^sesiK = k) = IEi^^{w^)=p > 1, 

k 



J2vtlPsUK = k) = lEi^M) > 0. (22) 



which for arbitrary strength of selection 5 > 0, is equivalent to 

Conditioning on Ai means that the focal individual is a randomly chosen altruist from 
the population. Inequality (22) states that the expected payoff to this individual, from its 
behavior and the behavior of the others in its group, is positive. (This expected value can 
be seen as a "neighbor modulated" expectation.) This starts to look more like a traditional 
Hamilton rule. But one should keep in mind that this is only valid because we assumed 
the sampling to be done during the S.E.S.. Had we sampled from the initial generation, 
we would have obtained iEo(f^|Ai) = < by (CI), regardless of the value of m. 
Still, with this caveat, (22) carries heuristic power, and adds more meaning to (11). For 
computational purposes, though, one cannot use (22) until one has information on v. 

In order to further pursue the relationship between (22) and the expressions traditionally 
known as Hamilton's rule, we need to consider special examples. And before we can do it, 
we need to introduce the relatedness into our framework. 

If we define relatedness as the regression coefficient of I2 on Ii (or equivalently, of A2 
on y4i, or of A2 on A\), we have 

Covt(/i,/2) IPt{A^A2) - {vit) f IPt{A2\Ar)-p{t) 



Vart(Ji) p{t){l-pit)) 1-pit) 

Cov,(l - Ji, 1 - h) ^ IPtjAjA-,) - {1 - p{t)f ^ FtjA^Ml) - - Pjt)) 
Vart(l-/i) p{t){l-p{t)) p{t) 
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Equivalently, can be defined by each one of tlie following four identities: 

IPt{A2\A{) = rt + {l-n)p{t), IPt{.Al\A^) = {l-rt){l-p{t)), 

IPt{A2\A\) = (l-n)p(t), PMllAl) = rt + {l-rt){l-vit)). 



(23) 



In particular, 

n = IPt{A2\A^)-IPt{A2\Al). (24) 
Combining (21) with (19) and (20) yields 

^ iE{K-i\A,) - iE{K\A\) ^ mmi) - mm'i) - \ .35) 

n — 1 n — 1 

The relatedness rt is closely associated to Wright's Fst statistics, defined as: 

Vart(p) 



ST,t 



Vart(p) + IEt{p{l-p))' 



(The numerator measures intergroup variability, and the second term in the denominator 
measures average intragroup variability.) To see how FsT,t relates to rt, we write 

Vart(ir) = Vart(/i + ... + /„) = nVait{h) + n{n - 1) Covtih, h) 
= np{t){l - p{t)) + n{n-l)p{t){l-p{t))rt 
= np{t){l-p{t)){l + {n-l)rt), 

and 

Yartip) + lEM^-p)) = IEt{{pf) - imP)? + JEt{p) - IEt{{pf) 

= -{p{t)f + p{t) = Pirn -Pit)). 

Since p = K/ n, we obtain, 

1 + (n — 1) rt n FsTt ~ 1 

FsT,t = , or, equivalently, rt = '— — . (26) 

n n — 1 

In particular, when n is large FsT,t is close to r^. A by-product of the computations above 
is the identity 

FsT,t = IEt{p\A,) - IEt{p\A\). 
which follows from comparing (26) with (25). This identity can also be rewritten as 

^ IEt{ph) _ lEtiPjl - h)) ^ CovtjpJi) ^ lEtjph) - p{t) 
P{t) I -Pit) P{t){l-P{t)) l-p{t) ' 

showing the FsT,t is the regression coefficient of p on Ji . 

Our framework allows for a very natural definition of "kin" and "genetical identity by 
descent", IBD for short (see Fig. 9). Say that two individuals in the same generation are 
/-kin in case they share a common ancestor at most / generations in their past. (The 1-kin 
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of an individual are its siblings, its 2-kin are its siblings and cousins, etc.) Because the 
number of groups g is large, individuals from different groups have negligible probability 
of being /-kin, unless I is comparable to g. As for individuals in the same group, migration 
events in their lineages play a major role in their being or not kin. If, when we follow their 
lineages back in time, we find a migration event in one of these lineages before they coalesce, 
then we know that the probability that they will coalesce in a time that is much shorter 
than g is negligible. With this in mind, say that two individuals in the same generation 
are IBD in case following their lineages backwards in time, they coalesce before (in the 
backwards sense) a migration event happens in either one. From the discussion above, we 
know that being IBD is essentially the same as being /-kin for some / that may have to be 
large, but is not comparable to g. 

We denote by D the event that the focal and the co- focal individuals are IBD, and 
define genetic relatedness for the allele A by 

Rt = IPt{D\Ai). 

In the E.S., IPt{Ai) = IPt{A2) = p{t) is negligible, and so is also IPt{AiA2D'') (we adopt 
the common convention of omitting the intersection symbol fl). This implies that = 
IPt{A2\Ai) = IPt{A2D\Ai) + IPt{A2D'\Ai) = IPt{D\Ai) = Rf Using also (19), we have 
then 

= rt = IPt(A2\Ai) = -^*(-^- ^l^i) ^ for t in the E.S.. 

n — 1 

Combining this with (21), gives that 

jl^ = rt = ^ses(-^-l) _ ^^^^^ ^ g_g_g__ .27) 

n — 1 



We look now into how the generalized Hamilton's rule (22) can be written in the case 
in which is a linear function of k. There is no loss in generality in supposing that 
this linear function can be written as = —C + {k — l)B/{n — 1), with B,C constant. 
We could, for instance, be considering the public goods game. Example 1, or an additive 
pairwise interaction. Example 4. But note that we are not making any assumption about 

, so that all that we are assuming is that at a cost C to itself, each altruist provides a 
benefit B/{n — 1) to each other altruist in its group. They may be providing benefits to 
non-altruists or not, and if they do, the amount of that benefit is irrelevant for the current 
computation. We do not need, in particular, to assume linearity of in k. We are also 
not yet assuming any restrictions on the values of C and B. 

The linearity of can be exploited by using (27) to write 

^iesi4) = -C + ^lEiesiK - I) = -C + BR^es- 
This transforms (22) into the familiar expression of Hamilton's rule: 

C < BRses, (28) 
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whenever = —C + {k — l)B/{n — 1). But note that even in this case, in which B and 
C are constants, the relatedness -Rses comphcates the apphcation of the rule. To compute 
i?ses theoretically, one still needs information about v. And if analyzing data from an 
experiment that is supposed to be well modeled by our framework, one would have to 
sample from the S.E.S., not from the very early stage (before stationarity settles in), or 
from the late stage when the number of altruists is no longer negligible as compared to 
the size of the whole population, so that the group type distribution is no longer directly 
related to v. For field data, this remark may make (28) of little value. Fortunately, this 
problem disappears in the case of weak selection, as we will see later. 

Condition (28) refers to the viability of a single new mutant A. It is clear that if we 
started from any number of copies of the gene A that were negligible as compared to g, the 
same arguments would apply and lead to the same rule. But had we started in generation 
from a distribution of groups with a non-negligible fraction of altruists, (28) would not 
in general provide us with the direction of evolution. Indeed, even the equilibria between 
alleles A and N, in the late stage of the evolution, will not in general turn the inequalities 
in (28) into equalities, unless vjsi = kB/{n — 1). This point, that is emphasized in [70], is 
well illustrated by considering Example 4, in which = kB'/{n — 1). As is well known, 
for arbitrary and , we can write the Price equation (15), or (16) also in the following 
form, where / = f{t): 

W{f)lEipit+l)-pm) = p{t){l-pit)){W^{f)-W''if)), 



where 



and 



J2k ^kkfk 



Ai), 



Ek^k 



(The second inequality in each one of these last two displays is analogous to (21), 
responding to averages over size biased samplings of the distribution /.) In case 



cor- 



-C + {k-l)B/{n-l) 
to 



and 



kB' /{n — 1), we can use (19) and (20), to reduce these 



W\f) 



1 + 5]Et{-C + {K -l)B/{n~l)\A{) = I + 5 {-C 
1 + 5IEt{KB'/{n-l)\A'i) = 1 + 6 B' IPt{A2\Al) . 



BIPt{A2\A{)), 



Hence, 

w\f) - iy^(/) 



5{-C + BIPt{A2\A{) 
6{-C + B{IPt{A2\A,) 
5{-C + Bn + Dil 



- B'IPt{A^\Al)) 

- IPtiAMD) + {B - B')IPt{A2\A'i)) 



where D = B — B' is the synergy parameter, and we used (23) and (21). When < p{t) < 1, 
and 6 > 0, the necessary and sufficient condition for IE{p(t + l)|/(t)) > p(t), is therefore 
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C < Brt + D{l-rt)vit). 



(29) 



Condition (29) was derived in the case n = 2 in [/ i], who called it Queller's rule, giving 
credit to the work in ["(i]. We refer the reader to [63], for an extension of Queller's rule 
when and are not linear functions of k. 

Consider now the threshold model in Example 3. In this case -ZE^g(f^) = —C + 
AIP^Q^{K > 6), so that the generalized Hamilton rule (22) reads 



Comparing (28) with (30) is elucidating. Both make the heuristic power of (22) appar- 
ent: in both cases the cost incurred by carrying the gene A must be compensated by the 
expected benefit that it brings to its carriers. We will address in more detain in the next 
paragraph the fact that this expected value is in the particular distribution iP^g, related 
to u. For the moment we focus on the fact that while (28) is in the usual Hamilton's 
rule form, (30) is not. We do not see anything deep about this difference. The special 
form of (28), involving relatedness, is a feature of the linearity of v^, that allows one to 
break the computation of an expectation into a sum of correlations between the genotype 
of the focal and each one of its group companions, taken one at a time. In the case of 
(30), the threshold nature of v^, does not allow for such a decomposition in any mean- 
ingful and simple way, as far as we can see. Trying to rewrite iE^g(f^) in terms of the 
relatedness parameter -Rses in this case, does not seem a natural idea. But there isn't 
anything really that fundamental about relatedness in (22). What is important there is 
the distribution of the random variable K, i.e., the values of IP^^{K = k), k = l,...,n. 
For a threshold model, the particular feature of this distribution that is of relevance is 
the tail probability IP^^i^ — -^^^ ^ linear model the relevant aspect of this distri- 
bution is the expectation IE^q^{K), that is naturally expressed in terms of relatedness, 
since lE^esi^) = ^iesi^ - 1) + 1 = {n - l)Rses + 1, by (27). Trying to rewrite (30) in 
terms of relatedness seems as unnatural to us as trying to rewrite (28) in terms of the tail 
probabilities IP^^i^ ^ k). In this connection, we refer the reader also to the conceptual 
discussion on this model, in the case n = 3, in [70] (where it was called "stag hunt game"), 
[44], [71] and [20]. Especially important is the fact that in [20] both, (29) and (30) are 
written in the traditional Hamilton form c < br. This is accomplished there at the cost of 
defining c and b as appropriate regression coefficients, that depend not only on parameters 
in the payoff functions, but also on the distribution of genes A and N in the population 
(see, for instance, (12) and (13) in that paper). Conceptually, this raises the question of 
what c and b mean in Hamilton's rule. Computationally, we have found it easier to com- 
pute directly with (29), (30) and more generally with the generalized Hamilton rule (11), 
or equivalently, (22), rather then with the methods from [20]. Their c and b vary in time, 
as the distribution of genes A and N changes. For instance, to rewrite our (30) in their 
Hamilton form c < br, one has to consider the distributions of these genes in the S.E.S.. 
Therefore, one has to first find u and then use it in the computations of c and b as regression 



C < AIPi^^{K>e). 



(30) 
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coefficients. But once v is obtained, (30) is available with little additional computational 
work, since IP^q^{K > 6) = Ylkye ^^k/ 'Yl,k=i n ^^k- For this matter, obtaining p directly, 
as Perron-Frobenius eigenvalue of M{A + B), and using (11) is, off course, easier than 
using either (30), or the methods from [20]. This raises the question whether (30) is ever of 
computational value. The answer is positive, since in Section 5 we will see that (22) and, 
in particular, (30) yield simple and elegant formulas, when 6 is small, and n is large (see 
(56), (57) and the related Fig. 11). 

The viability condition (22) and its special cases (28) and (30) are heuristically mean- 
ingful, but computationally and experimentally they may not be such an advance over (11). 
The distribution z/ is built into -ff'^s(-)) ^ind in particular into -Rses cind IP^^i^ — Com- 
putationally, finding u is at least as demanding as finding p. Experimentally, one would 
not expect to sample in nature from the S.E.S., but rather from the late stage, after A 
has invaded and is now either fixated, or in a polymorphic equilibrium with N. (One can 
conceive, though, lab experiments in which one could sample from the S.E.S..) Fortunately, 
these problems disappear when selection is weak, as we will see next. 

4 Weak selection and further conceptual discussion 

We turn now to simplifications to our analysis in case of small 6, i.e., weak selection. In this 
case it is well known (see, e.g., [59], [40]) that there is a separation of time scales. Most of 
the time in most places evolution is occurring as if S were 0, i.e., via neutral genetic drift. 
Only occasionally events happen that are caused by the slight differences in fitness of the 
individuals. Mathematically, what one gains is the possibility of studying the process as 
a perturbation of the neutral evolution. For us, this will be encapsulated in a result that 
we state next. We will include S now as a superscript in the notation of quantities that 
depend on it (e.g., , IP^^^, i?ies, Rt etc). 

We observe that when 5 > is small enough, the generalized Hamilton rule (11) and 
its equivalent (22) can be replaced by the requirement that 



Indeed, (31) is an immediate consequence of (22), due to the continuity of the Perron- 
Frobenius left-eigenvector of the matrix M^{A + i?) as a function of 6. As 5 — )■ 0, the 
vector converges to u^, the Perron-Frobenius left-eigenvector of the matrix M°(A + B). 
Importantly, M^{A + B), and hence also u^, does not depend at all on the payoff functions. 
They depends only on n and m. This is illustrated in Fig. 8, where one can see that is 
model dependent for large S, but becomes model independent under weak selection. 

The important simplification in (31), with respect to (22), is that has been replaced 
by in the left hand side of (22). In particular, (31) is not affected by the values of . 
This may seem surprising at ffist sight, but can be understood as follows. In the E.S., most 



Ek H 



Y^vtlPii{K = k) = lEiiivi) > 0. 



k 



(31) 
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type N individuals are in groups without altruists, so that the mean fitness of the gene N 
is close to 1, not depending on the . The values of the do affect the distribution z/^, 
because they affect the fate of the groups with altruists, and in this way affect the mean 
fitness of the gene A. (This is how the affect (22) and its special cases (28) and (30).) 
But when 5 is small, is close to z/°, and its dependence on and is a perturbation 
of order 5. This dependence produces only a second order effect (of order 5^) on . 

Conceptually, one can see (31) as a separation of the effects from demographics from 
those of fitness. The distribution is completely determined by the demographics, while 
the payoffs carry only information about the fitness function. This separation makes the 
concept of neighbor modulated fitness particularly appealing in the regime of weak selection. 
The viability condition (31) says that the mutant gene A is viable in case its average 
(neighbor modulated) fitness is larger than that of the wild type N before A appeared, 
with the weights in the average being given by the fashion in which the demographics 
alone arranges the genes in the population. 

For theoretical analysis, (31) is much simpler than (11), or (22). One can compute 
once and for all the distributions i/", that depend only on n and m. Then, in analyzing a 
model (given by and f^), one simply looks at the average value of with respect to 
the universal distribution v^. This allows for a much greater intuition of what to expect, 
when comparing models, than was possible in the case of strong selection, in which and 

also affect the distribution . For instance, if two models I and II are comparable 

in the sense that f^' < v^' , k = 1, ...,n, then we have mj < mp. This allows one to 
use a simple model, like the threshold model in Example 3, to obtain estimates (say, lower 
bounds) on the value of for more interesting and realistic models, like those in Examples 
2, 5 and 6. 

The viability condition (31) still refers to the S.E.S., but this can be overcome by con- 
sidering a different random variable instead of K. Define as the number of individuals 
in the group to which the focal belongs, that are IBD to the focal (the focal included in 
the counting). Clearly < K, and, with overwhelming probability, = in the E.S.. 
Observe that IP^{K^ = k\Ai) = IP^{K^ = k), for all t, since conditioning on Ai does not 
affect lineages, when 6 = 0. But while IPt{K = k\Ai) depends on the fraction of altruists 
in the population, and so changes with time, IP^{K^ = k) becomes constant for t » 1 
(or, more precisely, for t >> 1/m, so that by time t it is likely that migration events have 
occurred in the focal's lineage). We denote by tt = (tti, vr„) this equilibrium distribution: 



TT, = iP°(ir^ = A:) = IP^iK^ = k\A,) = lPs1iliK = k) = (32) 



TTfc = IPl^{K^ = k), for t » 1 
Considering t in the S.E.S., which has both t » 1, and = 



K, gives then 




k' 



And the viability condition (31) can be rewritten as 




(33) 



k 
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This is a major improvement over (11) and (22), since now no reference to the S.E.S. is 
left. The distribution vr only depends on the structure of the population and how genes 
flow under neutral drift. Experimentally it can be accessed by sampling any neutral genetic 
markers from the population in demographic equilibrium (the condition t » 1). 

The distribution vr is directly associated to the relatedness R^, t » 1, in the same 
way that the distribution JP^qI,{K = k) is associated to -Rges- Recall the enumeration of 
the members of the focal group, in which the first individual is the focal. Now decompose 
— 1 = I2 + ■■■ + In 1 where, J^^ = 1, if the jth individual in this ordering is IBD to 
the focal, and If = 0, otherwise. This yields IEf{K^ - 1) = ^^(^2^) + ••• + lEfil^) = 
{n — 1)IP^{D), and therefore 

= IP^{D\A,) = IP^{D) = ^"^^""^ ~ ^ = ^^^""^ ~ ^ = (34) 

n — \ n — 1 

for t >> 1, where the last equality introduces a new notation. 

One can compute i?" relatively easily, as follows. If either the focal or the co-focal 
are migrants, they are not IBD. If they are both non-migrants, they chose their parents 
from the parental group of their group. With probability 1/n they chose the same parent, 
and are therefore IBD. With probability 1 — 1/n they chose different parents. In this last 
case they are IBD exactly if their parents are IBD, and this event has probability R^. 
Assembling these pieces, we obtain R^ = (1 — my{l/n + (1 — l/n)R^) and hence, 

n — {n — 1)(1 — m)^ 1 + 2nm' 

where the approximation is good when m is small. This a well known result by Wright for 
the infinite islands model, for haploid individuals. It is important to clarify that even when 
(5 = 0, our framework is not identical to the infinite islands model. In that model there 
are a large number g (to be taken to 00 in the computations) of islands, each one with 
n individuals. In generation t + 1 the n individuals in each island choose, independently, 
a parent from the individuals in the same island in generation t. This is followed by 
migration at rate m, that is implemented in exactly the same way as in our framework. 
Differently from our framework, each island is "parent" to exactly one island in the next 
generation. In our framework, when 6 = 0, each group is parent to a random number, with 
distribution Bin(5f, 1/g), of groups. In spite of this difference, it is not simply a coincidence 
that lead to the same formula (35) in both frameworks. They share the same coalescence 
structure, when we ask ourselves questions related to IBD. Not only is identical in these 
frameworks, but so is also the distribution vr. Indeed, when we follow lineages backwards in 
time, until there are migration events, it does not matter if the group that we are following 
from generation to generation is the same (as in the infinite islands case) or changes (as in 
our case). 

We will show next that 

r° = (36) 

when t >> 1, so that can also be obtained experimentally from standard statistical re- 
gression methods (for instance using (26)), applied to neutral genetic markers, by sampling 
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from a single generation t, without information about past generations or any knowledge 
about kinship. 

To emphasize where in the proof of (36) the assumption 6 = will matter, we first 
write, for arbitrary 6 > 0, 



4 = iP,'iAMi) - iPtiMA'i) 

= IPf{A2\DAi)IP^\D\A^) + iP/(A2|/^^Ai)iP/(Z}^|Ai) 
- IPf{A2\DAl)IPt\D\Al) - IPf{A2\D^Al)IPt\D^\Al). 

Clearly IP^\A2\DAi) = 1, IPf{A2\DA'l) = 0. The fact that 5 = allows the following 
simplifications: 

= IP^DIA,) = P^{D) = F^DlAll 



so that also 
and 



IP^{D^\Ai) = = P°(D"|A5) 



iP°(A2|D"Ai) = IPl^{A2\D') = IPl^{A2\D'Al). 

(This would generally not hold with 6 > 0, since then information on the occurrence or not 
of Ai biases the lineage of the co- focal, even if it does not meet that of the focal.) With 
the simplifications above, we readily obtain (36). 

The identity (35) shows that, as expected, is a strictly decreasing function of m. 
This identity can be inverted as 



m = 1 — A r — . (37) 

With fixed n there is a 1-to-l correspondence between < m < 1 and < < 1, given 
by (35) and (•:)7). It is natural then, in the regime of weak selection, to define the critical 
value of R^ as 

' " n - {n - - nisY' ^^^^ 

This is the least level of relatedness that makes the gene A viable in our framework, under 
weak selection. Even when selection is not weak, we can define i?^ via (38), since R^ is 
a natural alternative way to identify the value of m, through (35), and can be measured 
using neutral genetic markers. See Fig. 13 for the shape, on a logarithmic scale of the 
function 



Under weak selection, when = —C + (fc — 1)B /{n — 1) is a linear function of A;, as in 
the public goods game, or additive pairwise interactions, the viability condition (28) now 
reads 

C < BR\ (39) 
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This is a fully standard Hamilton rule, with independent of B and C, being a function 
only of the population structure, through n and m. The critical values are therefore 




C/B 



or, equivalently. 



'S 



1 - 



B + C{n-1) 



Cn 



B-C 
2Cn 



(40) 



where we used (37), and the approximate result is valid when n is large. 

Condition (39) is an expression in our setting of the common wisdom according to which 
Hamilton's rule in its standard form applies when selection is weak and interactions are 
pairwise and additive (Example 4, in our setting). In this regard, there are several points 
worth commenting. 

First, (39) holds with no assumption on , so that it goes beyond that common wisdom. 

Second, that as explained after (28), we are only considering a viability condition for 
an initial situation with a single (or for this matter, a number « g) oi type A individuals. 
Even in the case of weak selection, and with pairwise additive interactions (so that = 
—C + [k — l)B/{n — 1), = kB'/{n — 1)), when the numbers of type A and type N are 
comparable to g, the condition for selection to increase the frequency of type A is given by 
Queller's rule (29), that now reads 



where, as before, D = B' — B and p is the fraction of type A in the population. 

Third, the assumption of weak selection greatly simplifies relatedness, and makes it 
more universal, but otherwise, also under strong selection (28) has the usual Hamilton rule 
form. 

Fourth, in our setting it is not the pairwise additive nature of the interaction that 
matters. What matters for (28) and (39) to hold is linearity of in k (and for (29) and 
(41) to also hold, additional linearity of in k), so that one can use (19) and (20). In 
Example 4, we explained that this sort of linearity is formally equivalent to having pairwise 
additive interactions, but it can also result from interactions involving many individuals at 
the same time, as in Example 1. 

Fifth, the meaning of C and B in (39) has to be carefully understood. This point is 
well illustrated by considering the fitnesses in (11), that corresponds to the case of pairwise 
interactions in iterated prisoners dilemmas, with types N always defecting and types A 
playing tit-for-tat. In this case C = c{n — 1) and B = {{h — c)T + c){n — 1), where c is 
the cost to an actor each time it cooperates, h is the benefit that an actor provides to its 
partner each time it cooperates, and T is the average number of iterations of each pairwise 
interaction. The viability condition (39) now reads c < ((6 — c)T + c)i?° and we have 



For T = 1 we have i?g = c/6, but decreases as T increases. Nothing here is surprising, 
but these computations illustrate the fact that C and B are life-cycle costs and benefits, 



C < BW + D{l-W)p, 



(41) 
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and not costs and benefits in each momentary interaction (even if tliose are constant, as in 
tlie present case). 

In contrast to (39), tlie weak selection form of the viabihty condition (30), for the 
threshold model. Example 3, reads 

C < A J^TTfc. (43) 
k>e 

The quantity Ylik>e'^k depends only on n and m. Thanks to (37), it can also be seen as a 
function of n and only. But we see no hope in expressing this functional dependence 
in simple terms, that would allow us to derive a simple expression for i?^ in this example. 
We will, nevertheless, obtain interesting approximations for it in Section 5. We will also 
be able there to compute the exact limit of i?° for this example, as n ^ oo, provided that 
a = an for some constant < a < 1. 

To apply the viability condition (33), we need to compute tt. One solution is to find 
z/°, and use (32). And can be computed as the Perron- Frobenius left-eigenvector of 
M°(A + B), where 

{M\^y = IP{Bmin,k/n) = k'). 
This yields, after some simplifications, 

(M(>{AMm\ - S ^(Bin(r2,(l -m)A;/n) = j) + mk, if j = 1, 

^ l^ + ^Jjfcj - \ ip^Bm{n,{l-m)k/n) =j), if j=2,...,n. ^ > 

Moreover, one can use the fact that p*^ = 1 to further simplify the computation of z/°. 

One can also use, alternatively, methods from coalescence theory, to study the distribu- 
tion TT. But because we find these methods quite cumbersome for this purpose, we introduce 
next a further alternative approach, illustrated in Fig. 10, and explained next. 

Consider again the random experiment of choosing a focal individual from generation 
t. Denote by J-'u, u = 0, the ancestor of the focal in generation m, so that, in 

particular, J^t is the focal. Denote by Qu the group to which J^^ belonged. Then denote 
by the number of members of Qu that where IBD to the focal's ancestor J^^ (including 
J^u itself). Our basic observation is that, when 5 = 0, so that there is no selection, the 
sequence of random variables Kq K^, ...^K^ forms a time- stationary Markov chain on the 
set {1, ...,n}, described as follows. Given the value of K^_^, the value of is: 

(MCI) With probability m, set to 1. [Migration event in focal's line of descent.] 

(MC2) With probability 1 — m set equal to 1 + Bin(n — 1, (1 — m)K^_'^/n). [No migration 
event in focal's line of descent.] 

This corresponds to a transition matrix Q (that does also not depend on t) given by: 

{ m + {1 — m) IP (Bin(n — 1, (1 — m)i/n) = 0), if j = 1, , . 

^''^ ^ \ (1 -m)iP(Bin(n- 1, (1 -m)i/n) = j - 1), if j = 2,...,n. ^ ^ 
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To understand this claim, first observe that if the ancestor of the focal in generation m, Tu 
was a migrant, then, by definition of IBD, we have in no one other than Tu itself IBD to 
J-'u, and hence = 1. This corresponds to (MCI) above. On the other hand, if J^u was 
not a migrant, then the number, — 1 of other members of Qu that were IBD to J-'^ is 
easily obtained as follows. Each one of the n — 1 other members of Qu chose independently 
a parent from Qu-i- The group Qu-i had -ft'^i members that were IBD to J^u-i (with 
J-'u-i included). Members of Qu other than became IBD to J^u if two conditions were 
satisfied: they had to choose a member of Qu-i that was IBD to J^u-i, and they had to 
stay in the group Qu, rather than migrate out. (If they migrated out, their replacements 
would not be IBD to J-'^, by definition of IBD). Each one of these n — 1 individuals in Qu 
had, therefore, independently, probability (1 — m)K^_^/n of being IBD to J^u- This gives 
for — 1 the binomial probability in (MC2). Adding 1, for J^u itself, gives us the full 
expression in (MC2). 

The Markov chain K^^^, starts from Kq = 1, and when m >> 1 it will have reached its 
stationary state tt. (It is clear that the chain is irreducible and aperiodic when < m < 1, 
since in this case all entries of Q are strictly positive. In case m = 0, it is clear that the 
chain converges to its single absorbing state n, so that vr = (0, 0, 0, 1). In case m = 1, it 
is clear that the chain converges to its single absorbing state 1, so that vr = (1,0,0, ...,0).) 
One can compute vr using stationarity, by solving the linear system of equations 

7lQ = 71, Y^TTk = 1. (46) 

k 

Identity (32), relating vr and z/°, can now be alternatively derived by observing that, 
from (41) and (15), we have j(^°(A + B))k,j = kQkj. 

The distribution tt enjoys a nice monotonicity property, as a function of m. We recall 
that a probability distribution t] over {1,...,?7,} is said to be stochastically larger than 
another one, C, if ^k>ko Vk > ^k>ko Ck, for ko = 1, n. In this case we write r] y It 
is known that this relationship is equivalent to the statement that 'Y^i^T]khk > 'Ylik^khk, 
whenever is increasing in k. We claim that vr is stochastically monotone decreasing in 
m, i.e., 

7r(m) >z Ti{m), whenever m! > m. (47) 

Therefore, under (C4), the left hand side of (33) is decreasing in m. In particular, there 
can only be one value of that satisfies Ylk '^k'^ki.i^s) = 0. The claim (47) follows from 
the double observation that T]Q{m) is stochastically decreasing in m and stochastically 
increasing in t]. (To see this more easily, consider the description of Q in (MCI) and 
(MC2) above. Note that with fixed, increasing m decreases -ft'^i, and that with m 
fixed, -ftT^i is increasing with K^^.) We can use the monotonicity in m in this observation, 
to write, whenever m' > m, 

7r(m) = TT{m)Q{m) >z TT{m) Q{m). 

Now, we can use the monotonicity in rj in the observation above to iterate this inequality 
and obtain 7r(m) >z 7r(m) {Q{m')y, for arbitrary t. Letting t — ?• oo, gives 7r(m) >z '7r(m'). 
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The Markov chain introduced above can also be used in other ways, by allowing one 
to write recursions for quantities of interest. For instance, this method can be used for 
computing the moments of the distribution n, Aii = Y^kk'"Kk, I = 1,2,.... This is in 
principle very useful, since an arbitrary can always be approximated by a polynomial. 
So, in theory, one can compute IE{yj^) in good approximation and, through (33) com- 
pute iTLs also in good approximation. We will see in the next section, that in spite of the 
expressions for the moments M.i being quite involved, they provide very powerful informa- 
tion. The heuristic nature of the recursive method requires going behind the apparently 
cryptic expression for Q, and instead using its description in items (MCI), (MC2), that 
appears immediately before Q was introduced. We illustrate the method computing first 
the mean of vr, M. = J\Ai. When the Markov chain is stationary, both K^_^ and have 
distribution tt. Therefore, from (MCI) and (MC2) we obtain: 

M = m + {I - m) {I + {n - - m)M/n} . 

This yields 

Y^kT^k = M = ^- ^. (48) 

^ n — [n — — my 

This result could also have been obtained by combining (31) and (35), or alternatively (35) 
could have been obtained from (31) and (18). 

To express the /th moment, A^;, in terms of A^j, j = 1,2,...,/ — 1, we will use the 
following fact. If X has a Bin(A^, p) distribution, then 

iE(X') = J]^(/,j)iV,P^ 
i=i 

where Nj = N{N — 1) ■ ■ ■ (A^ — J + 1), and the Stirling number of the second kind, S{l,j), 
is the number of ways in which a set with / elements can be partitioned into j non-empty 
sets. Clearly S{1, 1) = S{1, 1) = 1, for all I. They are also known to satisfy 

■J' 1=0 ^ ^ 

The first few values of 5(/,j) are: 5(1,1) = 1; 5(2,1) = 1,5(2,2) = 1; 5(3, 1) = 1,5(3,2) = 
3,5(3,3) = 1; 5(4,1) = 1,5(4,2) = 7,5(4,3) = 6,5(4,4) = 1; 5(5, 1) = 1,5(5,2) = 
15, 5(5, 3) = 25, 5(5, 4) = 10, 5(5, 5) = 1; ... 

For our purpose, it is convenient to write s = 1 — m, and use the Markov chain descrip- 
tion (MCI), (MC2) as it applies to [K^ — 1)': With probability m this quantity takes the 
value 0, and with probability s it takes the value (Bin(n — X^sK^^^jn^f . Therefore 

1E{[K^ - Vf) = s S{l,j) [n - 1), s^IE{Kty/nK 
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Since in equilibrium, Kl^_^ and both have distribution tt, we obtain 

7=0 V / , = 1 



This provides us with the aimed recursion: 

-Mz = ^ Atv^ • (49) 

To illustrate the use of (49), we insert the value of A^i = A^, from (IS), into the 
recursion for M.2'- 

-1 + {2 + ^s']M, ^ -r? + {2n^ + n{n - 1).^ 
' 1 _ ^3 n2 - (n-l)(n-2)s3 

(n + 2(n - 1)5^) 



(n2 - (n - l)(n - 2)s3) (n - (n - l)s2) ' 
This gives for the variance of the distribution vr the value: 

The expressions above for A^^ and the variance are obviously very involved. They 
simplify substantially when n is large, since {n — l)j/n^ — )■ 1 as n — )■ oo. This is not a 
surprise, since n does not appear in (MCI), and in (MC2) the binomial random variable 
converges in distribution to a Poisson random variable, with mean sK^_^. The resulting 
Markov chain has state space {1,2,...}, but it is not hard to show that it is positive 
recurrent, and so has a single stationary distribution, to which vr converges as ri — oo. We 
will not explore this limit here, but rather study, in the next section, a limit in which as 
n — )■ oo, also m — )■ 0. As we will see, in this limit vr simplifies considerably, and leads to a 
number of very interesting applications. 

5 Limit of large n and small m under weak selection 

In this section we will continue to assume that selection is weak and we will study the limit 
in which 

n — 7- oo, m — 7- and nm — m, so that — t- 1/(1 + 2m) = (50) 

where we used (35). To state the results on the behavior of the distribution tt in this limit, 
we suppose that is a random variable with distribution vr, i.e., IP{K^ = k) = vr^. We 
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state and comment the results in (a) and (b) below, and afterwards, explain how to do the 
computations. (See also Fig. 12.) 



(a) If < m < oo, then for Z = 1, 2, 

Ml l\ 



(2m + l)(2m + 2)---(2m + /) 



Ml. (51) 



This implies (see, e.g.. Section 2.3.e of [1 ']) that the random variable /n converges in 
distribution to a distribution with /th moment Mi. 

When < m < 1, these moments characterize a Beta distribution, with parameters 
a = 1 and (5 = 2m, which has density fm{x) = 2m (1 — x)^™~^, < x < 1. In other 
words, 



"1 

IP{K^/n>x) I fff,{x')dx' = {l-xf^, (52) 

J X 

for arbitrary < x < 1. Notice that, for each x, this tail probability, (1 — x)^*", is de- 
creasing in m, meaning that the corresponding family of Beta distributions is stochastically 
monotone decreasing in m. 

The case m = 1/2, which has = 1/2, is particularly simple. In this case the 
limiting distribution, Beta(l,l) is the uniform distribution between and 1, with fm{x) = 1, 
< X < 1. This case can be seen as separating two qualitatively distinct cases: When 
< m < 1/2, the density fm{x) is increasing in < x < 1, while when 1/2 < m < oo, the 
density fm{x) is decreasing in < x < 1. In the extreme cases, in which m is close to or 
very large, the density ffh{x) concentrates, respectively, close to 1 or 0. 

When m = 0, we have Mi = 1, for each / = 1,2,.... These are the moments of 
a degenerate random variable, that takes the value 1 with probability 1. Therefore the 
random variable K^/n converges in probability to 1. 



(b) If m = (X), we have for / = 1, 2, 



^ ^ 0. (53) 



These are the moments of a degenerate random variable, that takes the value with 
probability 1. Therefore the random variable /n converges in probability to 0. 

But a different way of scaling K^, provides a non-degenerate limit. We have, for 
/ = 1,2,..., 

m'Mi ^ |, (54) 

i.e., mK^ converges in distribution to a random variable with Zth moment l\/2K These 
moments characterize an exponential distribution with mean 1/2. In other words, 

IP{mK^ > x) exp(-2x), (55) 
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for arbitrary x > 0. 

The claims in (51), (53) and (51), about the convergence of the scaled moments, can 
be readily obtained by induction in Z = 1, 2, from ( 18) and ( 19), and the following two 
observations. First, for j = / — 1, 



Second, the denominator in (49) can be rewritten as 



I (l-l) I (l + l) 



1 + 2 + ... + / 



7T, 

/(/ + 1) 



+ (/ + 1) m + A(n,m) 
+ (/ + l)m + A(n,m), 



2n 

where A (n, m)/m — )■ and A (n,m)n — > 0. 

A question that comes naturally to mind is whether for the threshold model in Example 
3, we can find the exact value of m^, in the case of weak selection. Unfortunately we have not 
been able so far to compute the exact value of '^f.yg vr^. We can, nevertheless, use the result 
in (52) above, that states that when n is large and m is small, then Ylk>e '^k ~ (1— 6'/n)^™'". 
Using this approximation in combination with (43), yields 

log(CM) 

2n\og{l -e/ny 

This approximation should be good when n is large, and 6/n is not too close to 0, so that 
the resulting value of rris is small. When, additionally, 6/n is substantially smaller than 1, 
we can further approximate 

^ ^ log ( 7^ ) . (57) 



29 ° 

See Fig. 14, for a comparison of the exact value of under weak selection, from the 
viability condition (33) (or, equivalently, (13)) and the approximations (56) and (57). Sur- 
prisingly, (56) gives a good approximation there even when 6 is small. 

The approximation (56) for Example 3 can be formalized and extended to a wide class 
of models in the following way. (See Fig. 16, for an illustration.) Suppose that there is a 
piecewise continuous and bounded function v^, < x < 1, such that 

^rnax 1^^ — — > 0, as n — )• oo. (58) 
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Then, the result in (52) above, combined with the continuous mapping theorem, (2.3) in 
Section 2.2.b of [11], imphes that, when < m < oo. 



Vw^TTfc [ v^fff,{x)dx = 2m [ {I - xf""-^ dx = V^{m), (59) 

in the hmit (50), where the last equality introduces a new notation. This means that, in 
this limit, the viability condition (33) now reads 

V^{m) > 0, or, equivalents, [ {1 - xf^'Ux > 0. (60) 

Jo 

We want to define rhs as the solution of 

V^'ims) = 0. (61) 



To assure that this equation has a solution it is sufficient to assume that is continuous 
at the end-points x = and x = 1. In this case, under under (CI), 

as m — )■ oo. And under (C2), 

V^{m) ^ = > 0, 

as m — 7- 0. Since clearly V^{m) is continuous in m, (61) then must have a solution 
< rhs < oo. If there is more than one solution, then we define rhs as the largest one. If 
condition (C4) holds, then is non-decreasing in < s < 1. In this case, the stochastic 
monotonicity of the Beta distributions in (a) imply that V^{rh) is non-increasing in fh. 
Actually, it is clear, from the behavior of the density fm{x) = 2rh (1 — x)^"*"^, that V'^^rh) 
is then strictly increasing in fh, unless is constant. Hence, when is continuous at 
and 1 and (CI), (C2) and (C4) hold, (61) defines rhs uniquely. 

The viability condition (60) implies that if we consider a model that satisfies (58), then 
in the weak selection regime, 

nrus ^ rhs and ^ = , as n — )■ oo. (62) 

1 + znis 

Example 3 satisfies (58), if we take 6 = [^^], for some < ^ < 1, where [y] is the 
smallest integer larger than or equal to y. In this case, = —C, ior < x < 6, and 
= -C + A, for 9 < X < 1. This yields V^{m) = -C + A{1- Bf^, and therefore 

log(CM) , , 1 

rris = or, equivalently, it, = 

2 1og(l-^) 1 + 



log(C/A) • 
log(l-9) 
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The approximation (56) can be seen now as a special case of (62). 



Example 2 also satisfies (58), if we take a = \na\, for some < a < 1, where \_y\ 
is the largest integer smaller than or equal to y (the integer part of y). In this case, 

= -C + Bx, for < X < a, and = T (-C + Bx), for H < a; < 1. (See Fig. 16.) 
Condition (12) is satisfied, when n is large, in case a > C/B, and fails if a < C/-B. In case 
a = C/B, (12) may be satisfied or fail, depending on whether na is close to a or a + 1, 
but in either case, the left hand side of (12) is of order 1/n, so that it is only marginally 
satisfied or violated. In our analysis below, we will not assume that (12) holds. We have, 
for this model, 

= 2STT + (T-l)(i55-C)(l-5P + (^-1) \~J, ■ 

Equation (61), does not lead in this case to a simple expression for rhs, as it did in 
Example 3. We can nevertheless still derive very useful information from it. To simplify 
the resulting equation, set R = R'^ = l/(2m<, + 1). Then (61) reads 

C - BR = {T - 1) {bR + ~F 1 (1 - a)'/"^, (63) 



except in the trivial case a = 1, in which the right hand side of (63) is 0. This case 
corresponds to a = an = n, so that types A cooperate only in the first round. Obviously 
then. Example 2 reduces to Example 1, and indeed, (63) reduces to (39). 

In the opposite extreme, when a = 0, so that a = an = and types A cooperate in 
each round of the game, the right hand side of (63) reduces to (T — 1) (C — BR). So (63) 
is again reduced to (39), as one should expect. 

Note that also when T = 1, Example 2 reduces to Example 1, and (63) reduces to 
Hamilton's rule (39). One can see the right hand sight of (63), as a correction to that form 
of the viability condition in the more general Example 2. 

Fig. 15 compares the exact values of and for instances of Example 2, under weak 
selection, from the viability condition (33), and the approximation provided by solving 
(63). Notice in these graphs, that R^ is significantly smaller than C/B, when a is close to 
C/B. 

While somewhat intimidating at first sight, (63) provides good information about the 
critical value We first state the main features of its behavior, and then explain how to 
do the computations leading to these claims. Of special importance is the case in which 
a = C/B = min{0 < x < 1 -.v^ > 0}. In this case (63) simplifies to: 

C - BR = {T -l)BR{l- C/Bf'^. (64) 

With C, B and T fixed, -R^ is a continuous function of < a < 1, that takes the value 
C/B on both end-points of this domain, is strictly decreasing when < a < C/B, and 
strictly increasing onC/S<a<l. It reaches therefore its minimum aia = C/B, where 
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it solves (61). It is not surprising that when (7/5 < a < 1, so that (12) is satisfied, we 
have < C/-B, since in this case, when altruists continue cooperating, it is always in their 
interest to do so. But it is somewhat surprising that also when < a < C/B, we have 

< C/B. It is intuitive that should reach its minimal value when a = C/B, since 
then altruists are persisting precisely when they should. 

With T and a fixed, R^ is a decreasing function of B/C, that goes to as 5/C — )■ oo 
(Fig. 17). With C, B and < a < 1 fixed, is a decreasing function of T that behaves 
as follows when T -> oo. If < a < C/B, then ^ (C - Ba)/{B - Ba), while Jf 
C/B <a< 1, then — 0. In this latter case, the convergence is rather slow, in that 
behaves asymptotically as (log(l/(l — a)))/log(T). (See Fig. 18.) 

The claims above follow from the behavior of the left hand side and the right hand 
side of (63). We denote them respectively by H{R) = Hc^b{R) and G{R) = Gc,B,T,a{R)- 
(With G(0) = 0, so that G is continuous on the interval [0, 1]. Note that not only G{R), 
but also all its derivatives converge to as i? — )■ 0.) See Fig. 19, for an illustration of what 
follows. The function H{R) is very straightforward; it is a strictly decreasing function of 
R, that is positive for R < C/B and negative for R > C/B. The function C{R) has 
the sign of the term inside the curly braces. That term inside the curly braces is when 
i? = (C - Ba)/{B - Ba). We define R = max{(C - Ba)/{B - Ba), 0}, and observe that 
< R < C/B, when a < C/B, and R = 0, when a > C/B. The term inside the curly 
braces, and therefore also G{R), is negative for R < R, and positive for R > R. Notice 
also that once it is positive, G{R) is strictly increasing in R, since it is then the product of 
strictly increasing positive functions. The behaviors described so far for H{R) and G{R) 
immediately imply that they are equal to each other at exactly one point R = R^, and that 
this point is in the open interval {R, C/B). 

The various claims about the behavior of R^ as a function of a, or of B/C, or of T, follow 
now from analyzing the behavior of the graphs of H{R) and of G{R), as these parameters 
change. We have dGc,B,T,z{R)/da = {T - 1) {C - Ba) {{1/R) - 1) (1 - a)'^^/^'>-^, which, 
regardless of the value of R, is positive for < a < C/B and negative for C/i? < a < 1. 
This means that the graph of G{R) moves upwards, as a increases from to C/-B, and then 
moves downwards, as a increases from C/B to 1. In the extremes, G{R) — > (T—1){BR—C), 
as a — )■ 0, for all i? > 0. (This convergence is not uniform, since the function G{R) 
converges to as -R — )■ 0.) And G{R) — > 0, uniformly in i?, as a — > 1. These facts, and the 
trivial behavior of H{R) that does not depend on a, provides us with the facts about the 
dependence of on a. 

The fact that depends on C and B only through B/C can be seen by dividing both 
sides of (63) by C. The fact that R^ decreases as B/C increases follows easily from observing 
that the graphs of H{R) and G{R) move, respectively, down and up, as B increases, with 
C fixed. And the fact that 0, as B/C ^ oo, is immediate from < < C/B (Fig. 

17). 

If we keep B, C and < a < 1 fixed and let T oo, then G{R) also goes monotonically 
to oo, for R < R < C/B, and stays at for R = R. The corresponding behavior of the 
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graph of G{R), and the fact that is the point in R < R < C/B where this graph 
intersects the graph of the decreasing function H{R), shows that, as claimed above, is 
decreasing in T, and as T — )■ oo, — )■ i? (Fig. 20). The claim about the slow speed of 
this convergence, in case C/B < a < 1, can be obtained by taking the logarithm of both 
sides of (63) and analyzing how the resulting terms behave as T — )■ oo. 

It is interesting to contrast (12), for iterated pairwise prisoner dilemma with types N 
defecting and types A playing tit-for-tat (Example 4, (11)) with our analysis above of the 
behavior of in Example 2, which is an iterated public goods game (an analogue of the 
prisoner dilemma in a mult i- individual setting), with types N defecting and types A playing 
many-individual tit-for-tat. There are several expected similarities in the behavior of i?^ 
in both function of costs, benefits and expected number of repetitions of the 

game. But there are also important differences to emphasize. There are differences in the 
details of the behavior. For instance, R^ goes to 0, as T — )■ oo, much more slowly in the 
case of Example 2, when it does go to 0. But equally important, we want to mention the 
differences in the level of complexity of the analysis in each case. While ( 12) holds for 
arbitrary n and resulted from a standard Hamilton rule, (39), our analysis of Example 2 
above relied on the much more elaborate results developed in this section, and depended 
on n being large. One of the main messages of the current paper is that when interactions 
involve several individuals at a time in the groups, one needs methods that go beyond those 
that apply to pairwise additive interactions. 

It is important to explain why the assumptions made in [7] on how much assortment 
to expect in Example 2, and later used in several papers, including [6], are excessively 
pessimistic. In [ ] it was supposed that conditioned on the focal individual being type 
A, the other n — 1 members of the focal group would be type A independently, with 
probability IPt{A2\Ai). In our framework, given that the focal is type A, and that the 
CO- focal is also type A, further increases the conditional probability that a third member of 
the group is type A. Given then that a third individual is type A, again further increases 
the probability that a fourth individual is type A, and so on. This is so because the 
information being successively provided keeps increasing the probability that there were 
several types A in the previous generation in the group from which the focal descends. 
It is very interesting to make the computation of using the assumption of [7] and 
compare the result with what we have obtained from (63). Under that assumption of 
conditional independence, the viability condition (33) for the regime of weak selection, 
would be replaced with Ylik '^k -ff'(Bin(n — 1, i?°) = k — 1) = 0. Assume that (58) holds and 
define v^* as v^, when x is a continuity point of v^, and as the average between the limits 
of from the left and from the right at x, otherwise. (We are supposing that these limits 
exist, as is the case in Example 2.) Then, by a central limit theorem, in the limit (50). 




k 
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(Compare with (59).) Instead of (61), we would then have 



v% = 0. (65) 

For Example 2, the only solution of (65) is = C/B, regardless of the value of a. This 
result would not depend on T, and would differ substantially from our result in case, e.g., 
a > C/B, T large, which has ^° « C/B. (See Fig. 15, Fig. 17, Fig. 18 and Fig. 20.) 
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intergroup level intragroup level migration 



Figure 1: Illustration summarizing the two-level Fisher- Wright process with selection and 
migration. Here w{j,a,t) is the fitness of individual j of group a in generation t, and 
w{a,t) is the average fitness of the members of group a in generation t. Intergroup level: 
in generation t + 1 each and every one of g groups choses a parent group a from the 
previous generation independently with probability proportional to w{a,t). Intragroup 
level: each individual inside a child group then independently choses an antecessor j among 
n individuals in his parent group with probability proportional to w{j,a,t) . Migration: 
each individual in each group is marked as a migrant with probability m, migrants are then 
randomly shuffled. 



52 



















PG 


L 5 10 15 20 





















IS'" 






GLF 







10 15 20 
k 



10 

8 
6 
4 
2 

-2. 
2 

1 























• 

■. 


■ 


:::: 


■ 

1 


IPG 







1 ^ 



1 5 10 15 20 



VCB 



1 5 



10 15 20 
A; 



10 
8 
6 
4 
2 


-2- 
6 

4 

2 



-1 







































THR 


■ ■■ 






5 10 15 20 



IG 



10 15 
k 



20 



Figure 2: Payoff profiles. Payoffs for the wild ("non-altruist", N) type are represented 
as red circles while black squares depict payoffs for the mutant ("altruist", A) type. 
From top left: Public goods game (PG, Example 1) for n = 20, C = 1 and B = 5. Iterated 
pubhc goods game (IPG, Example 2) for = 20, C = 1, -B = 5, a = 4 and T = 10. 
Threshold model (THR, Example 3) ior n = 20 C = 1, 9 = A and A = A' = 10. General 
linear fitness (GLF, Example 4) for n = 20, C = 1, B = 5 and B' = 2. Variable costs and 
benefits (VCB, Example 5) with Ck = C/k^\ Bk = hk'^/{l + dk"^) and 5^ = h'k'^/{l + d'k"^) 
for n = 20, C = 1, ai = 0.5, b = b' = 2, d = 0.05, d' = 0.065. Iterated game (IG, Example 
6) with cost and benefit functions as in the VCB case and = 1 if /c < 5, Tg = 2, T7 = 2.5 
and Tfc = 3 if A; > 8. 
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Figure 3: Perron-Frobenius eigenvalues p as a function of m for 5 = 0.1, 0.2 and 0.4. 
From top to bottom: Public goods game (PG, Example 1) with n = 20, C = 1, B = 5. 
Iterated public goods (IPG, Example 2) with n = 20, C = 1, i? = 5, a = 8 and T = 
10. Threshold model (THR, Example 3) with n = 20, C = 1, 9 = A, A = A' = 10. 
Critical migration values are obtained by solving p{ms) = 1. These figures should be, 
respectively, compared to Figure 4, Panel B, blue dashed line (PG); Figure 5, Panel B, blue 
dashed line (IPG); and Figure 6, Panel A, blue dashed line (THR). 
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Figure 4: Public goods game (Example 1): Panel A represents critical values as a 
function of the strength of selection 6. Curves correspond to the case C = 1, -B = 2 and 
n = 10 (top, black dotted line), n = 20 (middle, blue dashed line) and n = 50 (bottom, 
magenta full line). Red lines indicate critical values at the weak selection limit obtained 
from the viability condition (33), or (40). The inset shows the same curves within the full 
range of possible values for m^. Panel B depicts the same conditions except for B = 5. 
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Figure 5: Public goods game (Example 1): Perron- Probenius eigenvectors i/k as a function 
of the strength of selection S (rows) and of the migration rate parameter m (columns). 
Critical migration rates mf are annotated in each row. Perron- Probenius eigenvalues p^{rn) 
are also provided for each case. Histograms represent the case C = 1, 5 = 2 and n = 20. 
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Figure 6: Iterated public goods game (Example 2): Critical values rris as a function of the 
strength of selection 6. Panel A depicts the case n = 20, C = 1, B = 5, a = 4 with, 
respectively from bottom to top, T = 1 (dotted black line), T = 10 (dashed blue line), 
T = 100 (dot-dashed magenta) and T = 500 (green full line). Panel B depicts the same 
conditions except for a = 8. Red lines indicate critical values at the weak selection limit 
obtained from the viability condition (33). 
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Figure 7: Threshold model (Example 3): Critical values as a function of the strength of 
selection 6. Panel A depicts the case n = 20, C = 1, 9 = A with, respectively from bottom 
to top, A = A' = 5 (dotted black line), A = A' = 10 (dashed blue line), A = A' = 50 (dot- 
dashed magenta) and A = A' = 100 (green full line). Panel B depicts the same conditions 
except for ^ = 8. Red lines indicate critical values at the weak selection limit obtained 
from the viability condition (33), or (43). 
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Figure 8: Perron- Frobenius eigenvectors i^k for selection strengths S = 0.01 (left column), 
6 = 0.3 (middle column) and 6 = 0.7 (right column). Migration rate is set to m = 0.1 
and group sizes to n = 20. Each line represents a different model. The top row, labeled as 
PG depicts the Public Goods game (Example 1) with parameters C — 1 and B — 2. The 
Iterated Public Goods game (Example 2) with parameters C — 1, B — A, a — A and T = 10 
is shown in row at the middle, labeled as IPG. The bottom row shows Pcrron-Frobcnius 
eigenvectors for the Threshold model (THR, Example 3) with C=l^A = A' = 'o and 
9 = A. The leftmost column emphasizes that the weak selection limit is independent of 
the model. In contrast, when selection is strong, ul depends on the model, as illustrated 
in the other columns. 
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Figure 9: This diagram illustrates the concept of identity by descent (IBD) as it is em- 
ployed in the framework we have introduced. Two individuals X an F in a given group in 
generation t, regardless of their type, are identical by descent (IBD) if their lineages, when 
followed back in time, coalesce before a migration event (indicated by a dashed arrow in 
the figure in the right panel). Considering a migration rate of m, migration typically takes 
place within a random number, of order 1/m of generations back. 
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Figure 10: This diagram illustrates the discussion that leads to (45). represents the 
number of individuals that are IBD to a focal individual (red circle) in generation u. 
Two scenarios are discernible for the previous generation u — 1. MCI (left panel): the focal 
individual is a migrant. This can happen with probability m and, in the ES for (7 — )■ 00, 
implies that K^^ = 1. MC2 (right panel): the focal individual (red circle) is a child of 
J^u-i- Each individual in the focal group in generation u choses a parent from the group 
of J-'u-i in the previous generation with uniform probability, as 5 = 0. With probability 
K^_^/n a parent is IBD with the focal individual J-'u-i (orange circles) and, consequently, 
his children are also IBD with J^u- Additionally, each individual in generation u can migrate 
with probability m. The number of IBD individuals in generation u is, therefore, the focal 
individual himself plus a number of individuals given by a binomial random variable with 
probability of success (1 — m)K^_^/n in n — 1 trials. 
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Figure 11: Distribution given by (32), or (16) (bars) compared with kut/ J2k' ^'^k'y 
where ul is the Perron-Frobenius eigenvector with S = 0.01 for the Threshold model (THR, 
Example 3) with parameters n = 20, C = 1,A = A' = 5 and 6 = 4 (red diamonds). The 
comparison is repeated for migration rates m = 0.01 (top panel) and m = 0.1 (bottom 
panel). 
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Figure 12: Limit of large n and small m under weak selection. This figure compares tail 
probabilities for the distribution iik provided by (32) (stairs) and for Beta densities with 
parameters a = 1 and /3 = 2m. Panel A shows the case n = 20 for, from top to bottom, 
m = 0.01 (red dotted line), m = 0.1 (blue dashed line) and m = 0.5 (black dot dashed 
line). Panel B depicts the same scenarios for the case n = 100. 



63 



10^^ 
10-1 
10-2 
10-3 
cc; 10-^ 
10-^ 
10-^ 
10-' 











- - n - 20 

- - 50 
100 






































>i \ 
\ \ \ 
\\ t 










■ \i\ 

Vi 
V 



























0.2 



0.4 0.6 



0.8 



1.0 



Figure 13: Relatedness and migration rate under weak selection. For ease of comparison 
this figure depicts the relatedness (38) as a function of the migration rate mg for, from 
top to bottom, n = 20 (dot-dashed blue line), n = 50 (dashed green line) and n = 100 (full 
red line). 
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Figure 14: Limit of large n and small m under weak selection for the Threshold model 
(Example 3). Panels represent critical migration rates (A and C) and critical relatedness 
(B and D) for the Threshold model with C = 1 and A = A' = 10 as a. function of 9 = 9/n. 
Top panels A and B depict the case n = 20. Bottom panels C and D depict the case n = 100. 
In each panel critical values obtained by the viability condition under weak selection (33), 
or (43) (viability cond., black full lines) are compared with the approximation for large n 
and small m given by (56) (approx. 1, dashed blue lines) and with the approximation (57) 
that assumes n large, m small and also ^ ^ 1 (approx. 2, dotted red lines). 
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Figure 15: Limit of large n and small m under weak selection for the Iterated public goods 
(IPG) game (Example 2). Panels represent critical migration rates (A and C) and critical 
relatedness (B and D) for the IPG with C = 1, B = 5 and T = 100 as a function of 
a = a/n. Top panels A and B depict the case n = 20. Bottom panels C and D depict the 
case n = 100. In each panel critical values obtained by the viability condition under weak 
selection (33) (viability cond., black full lines) are compared with the approximation for 
large n and small m given by solving (63) in R (approx., dashed blue lines). In panel B we 
have R° = 4.02% when a = 20%, and in panel D we have = 5.54% when a = 20%. 
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Figure 16: Viability condition in the limit of large n and small m under weak selection. 
The case illustrated is the Iterated public goods (IPG) game (Example 2) with with C — 1, 

B = 2 and T — A. The first column depicts the density fm{x) for m = 1.50,0.91,0.40. 
The bottom row represents the payoff function for a = 0.3, 0.5, 0.7. The grid with nine 
plots represents v^fmi^) with m and a as specified in each row and column. The viability 
condition is given by V^{m) = dxv^fm{x) > 0. As m is decreased, the positive part of 
the integrand increases, eventually reaching the critical value rhs (annotated in the top of 
each column). Payoffs are maximized for a — C/B, implying a maximal value for rhs (in 
the case depicted rhs — 0.919). Decreasing a, increases the negative part of v^fffi{x) and, 
consequently, decreases rhg (increases R^). Increasing a, decreases the positive part of the 
integrand and also decreases m^. 
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Figure 17: Limit of large n and small m under weak selection for the Iterated public 
goods (IPG) game (Example 2): behavior of solutions for (63) - Part 1. Top panel: H{R) 
corresponds to the l.h.s. of (63) while G{R) depicts the r.h.s. of (63). H{R) is strictly 
decreasing and it is positive for R < C/B. Derivatives of G{R) converge to as i? — )■ 0. 
H{R) and G{R) are equal to each other at exactly one point R = R^ that is a decreasing 
function of C/B. Curves depicted correspond to the cases C/B = 0.5 (full black line), 
C/B = 0.2 (dashed red line) and C/B = 0.1 (dot-dashed blue line) with d = C/B and 
T = 100. Bottom panel: R^ as a function of a for C/S = 0.5 (top, full black line), 
C/B = 0.2 (middle, dashed red line) and C/B = 0.1 (bottom, dot-dashed blue line) and 
T = 100. R^ is continuous in the interval < a < 1, takes the value C/B on both 
end-points of this domain and has a minimum aX d = C/B. 
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Figure 18: Limit of large n and small m under weak selection for the Iterated public goods 
(IPG) game (Example 2): behavior of solutions for (63) - Part 2. Top panel: G{R) and 
H{R) for C/B = 0.5, a = 0.5 and T = 10 (leftmost, full black line), T = 10^ (dashed red 
line) and T = 10^ (dot-dashed blue line). is a decreasing function of T. Bottom panel: 
in the limit T ^ oo, if < d < C/B then ^ (full magenta line). lfC/B<d<l 

then i?^ — )■ very slowly. 
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Figure 19: Limit of large n and small m under weak selection for the Iterated public goods 
(IPG) game (Example 2): behavior of solutions for (63) - Part 3. H{R) (strictly decreasing 
straight line) and G{R) for C/B = 0.5 and T = 10 for fi = 0.01,0.1,0.3,0.4,0.5 from 
right to left in Panel A and for d = 0.5, 0.6, 0.7, 0.9, 0.999 from left to right in Panel B. 
The graph of G{R) moves upwards for < d < C/B and downwards for C/i? < a < 1. 
G{R) — > (T — 1){BR — C) as a — )■ (dashed magenta line in Panel A). In Panel B it can 
be seen that G{R) — )■ as a — )■ 1. 
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Figure 20: Limit of large n and small m under weak selection for the Iterated public goods 
(IPG) game (Example 2): behavior of solutions for (63) - Part 4. In all panels C/B = 0.5. 
Panel A depicts as a function of l/log(T) for a = 0.3 (full black line) and for a = 0.7 
(dashed red line). For 0<a<C/BR^—^ ^i-d^ (this value is approximately 0.286 for 
the case shown). U C/B < a < 1 then converges to very slowly as T increases, 
more specifically ~ — log(l — a)/log(T) (dotted magenta line). Bottom panels show 
the behavior of G{R) as T increases. Panel B: case a < C/B for, from right to left, 
T = 2, 10, 100, 500. Panel A: case a > C/B for T = 2, 10, 100, 500, from right to left. G{R) 
stays at zero for R = max{ '"^^f 7" , 0} and goes monotonically to infinity for R < R < C/B. 
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